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Come together 


Cross-continent collaboration in the sciences has become the norm. We must ensure that 


disadvantaged regions are not left out. 


space, scientists who wish to get on in 2016 should make a sim- 

ple resolution for the new year: broaden your horizons. Think 
beyond the conventional format of the academic paper and experi- 
ment with new ways to present data and results. Look past the histori- 
cal boundaries between academic subjects to the emerging landscape 
of interdisciplinarity. And, perhaps most importantly, embrace the 
growing trend of international collaboration. 

The benefits of international partnership are clear. Cross-border 
research receives more attention than does insular work and its pub- 
lications attract more citations. The promise to global science is obvi- 
ous, too: publicly funded research increasingly looks for impact and 
pay back, and many of the most immediate problems that science can 
help with are not defined by national borders. 

Issues of sustainability, health, access to food and water, stable eco- 
systems — the ‘grand challenges’ — are the products of complex chains 
and relationships, natural causes and human effects, across diverse yet 
connected regions. Solutions, and the science to seek these solutions, 
must sprout from a similar network: diverse yet connected. 

The Nature Index 2015 Collaborations supplement published in 
November demonstrates the trend towards collaboration (see go.nature. 
com/nji2gb). Some 70% of the academic papers analysed from the 
University of Cambridge, UK, for example, featured a co-author from 
a different country. It also demonstrates the shifting foundations for 
these international projects, which no longer need to be anchored to the 
usual big players of Europe’s leading lights, Japan and the United States. 
Scientists in Spain and Portugal are forging productive alliances with 
colleagues in South America. Australian researchers are increasingly 
looking to team up with scientists in the Asia-Pacific region. 

This reflects the new, broader geopolitics of the twenty-first century 
—achange neatly illustrated by the climate-change agreement signed 
in Paris last month. Nations such as China, India and Brazil — previ- 
ously defined in climate talks as poor developing countries — have 
taken on a more equal share of the responsibility for the struggle 
against global warming, to match their emerging higher status. 


A mid the pledges to exercise and to keep a tidier office or bench 


ON THE OUTSIDE LOOKING IN 

Not all scientists are benefiting from this era of cooperation. And, as 
bibliometrics specialists Jonathan Adams and Tamar Loach wrote in 
the Nature Index supplement, the cost of missing out can be severe (see 
J. Adams and T. Loach Nature 527, S58-S59; 2015). “If collaboration is 
linked with high impact, then research groups who are not part of the 
collaborative network risk being left behind, marginalized by a lack 
of access to the cutting edge of research in their field” 

Where these excluded scientists live and work will come as no sur- 
prise. Africa remains under-represented in this new world, more heav- 
ily so if the relatively strong part played by South Africa compared 
with the rest of the continent is taken into account. Yet challenges 


do not come much larger than those experienced in the patchwork 
of political, social and economic systems that make up the African 
continent. And as the Ebola virus outbreak has demonstrated, the 
problems of Africa — as well as having immediate and devastating 

local impacts — also challenge the rest of the world. 
How can research and the growing strength of international 
collaboration reach more developing nations? How can we ensure 
that the products of scientific research reach 


“The long- the bulk of humanity who would benefit the 
term solution most? 

to inequality It is no coincidence that China’s arrival 
of opportunity on the global scene and as a desired partner 
is equality of comes on the tail of massive domestic invest- 


ment in research. Many nations in Africa 
(and elsewhere in what is known as the global 
south) cannot or do not want to put serious money into science, and 
academic market forces — like it or not — will continue to drive par- 
ties in the global north elsewhere in search of synergies. 

Instead, scientific investment by rich nations in poor countries and 
regions has long been tied to the development agenda. As such, it 
is, rightly, not judged on scientific output — papers and citations — 
alone. But alliances of unequal partners can be notoriously awkward, 
and so it has proved with research funded in this way. Post-colonial 
paternalism gave way to scientific aid, but that change did not chal- 
lenge the donor-recipient dynamic and the polarizing problems it 
sets up in projects and relationships. In this model, those from the 
north who pay the bills too often decided the research agenda and 
how success will be defined, and those from the south were too often 
expected to fit in, provide the data and be grateful for the opportunity. 

Plenty of players — from government funders and philanthropic 
bodies to institutions and individual project leaders — are taking 
admirable steps to call attention to this kind of inequality and to 
address it. Those efforts deserve praise and support. 

The long-term solution to inequality of opportunity is equality of 
investment. For now, researchers involved with such asymmetric col- 
laborations must ensure that they do not take advantage. As horizons 
expand, so must the professional codes and ethical safeguards that 
reward input with appropriate credit and govern the fair and equitable 
use of data and materials. 

There must also be broader awareness that, just as there is more to 
research than papers, there is more provided to a partnership than 
conventional resources such as cash and equipment. The Nature Index 
supplement profiled an international project that published a genetic 
analysis of humans, chimpanzees and their lice. It quoted a Ugandan 
author on the paper as saying that it would have been impossible with- 
out the support of research partners in the United States and Europe, 
because the Ugandan group did not have the necessary technology. That 
is true. But then the partners did not have the necessary chimps. = 


investment.” 


7 JANUARY 2016 | VOL 529 | NATURE | 5 


© 2016 Macmillan Publishers Limited. All rights reserved 


MICHAEL TEMCHINE 


WORLD VIEW. jennssicnnen 


right — Earth really does move around the Sun. That step towards 

reconciling religious dogma with science took around 380 years. 
What will 2016 bring? Whisper it, but science and the church seem to be 
walking hand-in-hand on one of the defining issues of the twenty-first 
century, and in a way that is truly remarkable. 

The issue is climate change, and the force behind this new reconcili- 
ation is Pope Francis. He termed last month's COP21 global warming 
conference in Paris a “now or never” opportunity, and greeted the news 
that the talks had led to an international agreement by exhorting “the 
whole international community to proceed on the path undertaken in 
the name of an ever more effective solidarity”. 

Such sentiments built directly on the Pope's already famous encyclical 
letter of May 2015: along and somewhat rambling 
critique of modernity that, among other things, 
called for changes of lifestyle, production and con- 
sumption to combat climate change. 

Leading scientists have welcomed the Pope and 
the church into the fold of rationality. Johan Rock- 
strom, for example, the lead author of a widely 
cited article in Nature on planetary boundaries 
(J. Rockstrém et al. Nature 461, 472-475; 2009), 
has noted approvingly: “Pope Francis’ encyclical 
suggests — in line with our analysis — that plan- 
etary stewardship must now be the foundation of 
our values, beliefs and economic systems.” 

But the church’s concern about climate is 
remarkable not so much in how it lines up with 
scientists’ views, but in how it potentially chal- 
lenges them. Indeed, the Pope's moral logic makes 
clear that widespread agreement on the science is much less important 
than is a political environment that welcomes diverse belief systems. 

The effectiveness of the COP21 agreement — heralded as an impor- 
tant breakthrough because all nations have signed up to its aims — will 
depend on the ability of individual nations to reduce carbon emissions 
while still advancing the well-being of their citizens. This means that 
progress on climate will hinge on political decisions about how best to 
pursue both goals. Effective politics will, in turn, demand constructive 
engagement among multiple voices to achieve solutions that all can live 
with. “Solutions,” writes the Pope, “will not emerge from just one way 
of interpreting and transforming reality.” 

Yet the original sin of climate-change policy in the United States was 
that from the beginning it ruled out such pluralism, because scientists 
and environmental activists alike tended to frame action in a way that 
could only alienate economic and social con- 


I n 1992, the Catholic Church formally acknowledged that Galileo was 


servatives. Political rhetoric and policy propos- NATURE.COM 
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IF SCIENCE AND 


RELIGION 
ARE BEGINNING 
TO WALK TOGETHER, 


THE DEVIL 


REMAINS IN THE 


POLITICS. 


Constructive engagement 
is the Key to climate action 


This year, scientists should resolve to follow the lead of Pope Francis and seek 
an inclusive approach to climate change, says Daniel Sarewitz. 


change. From the perspective of US conservatives, it would be hard to 
imagine a more toxic combination of policy ambitions. And because 
scientists and climate activists claimed that science dictated their pol- 
icy agenda, conservatives had every reason to be suspicious about the 
motives of the scientists and the credibility of their science. The legacy 
of that strategy is evident in the uniform scepticism of the Republican 
presidential candidates about global warming. 

The Pope, however, draws direct connections between action on cli- 
mate and conservative US touchstones such as ‘family values. He empha- 
sizes the family as “the basic cell of society” and the starting place for 
action, because that is where “we first learn how to show love and respect 
for life” as well as “respect for the local ecosystem and care for all crea- 
tures”. And because, as ecologically minded people often observe, eve- 
rything is connected to everything else, the Pope 
reasons that “concern for the protection of nature 
is also incompatible with the justification of abor- 
tion. How can we genuinely teach the importance 
of concern for other vulnerable beings, however 
troublesome or inconvenient they may be, if we 
fail to protect a human embryo?” 

These values are likely to make many scientists 
and climate activists squirm. But, for the optimism 
stirred by the COP21 agreement to translate into 
tangible progress, climate politics in the United 
States will have to offer a serious place in the 
debate for the fundamental values that lie behind 
conservatism. As the Pope has shown, such values 
are perfectly compatible with action on climate. 
Along the way, they may even help to expose 
some of the contradictions and incoherence of 
the mainstream climate-change regime. 

For example, climate politics in the United States has often played out 
as a de facto attack on the cultural iconography of conservative middle 
America, such as pick-up trucks and muscle cars. Meanwhile, as people 
concerned about the climate jet off to international conferences and 
ecotourism sites, they can mitigate their guilt with carbon offsets — a 
modern sort of indulgence that the Pope terms a “ploy which permits 
maintaining the excessive consumption of some countries and sectors”. 

In acknowledging the climate problem, the Pope has also shown that 
in this new year, conservative voices and belief systems can begin to 
enter constructively into the climate debate after an absence of two dec- 
ades. Yet, if science and religion are beginning to walk together, the devil 
remains in the politics. And this is where, logically enough, science can 
learn a thing or two from religion. m 


Daniel Sarewitz is co-director of the Consortium for Science, 
Policy and Outcomes at Arizona State University, and is based in 
Washington DC. 

e-mail: daniel.sarewitz@asu.edu 
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SEVEN DAYS nescnsn 


EVENTS 


Periodic addition 
Four new elements have 

been officially added to the 
periodic table, completing its 
seventh row. The International 
Union of Pure and Applied 
Chemistry in Research 
Triangle Park, North Carolina, 
announced on 30 December 
that evidence supporting 

the discoveries of elements 
113, 115,117 and 118 by 
laboratories in Russia, the 
United States and Japan was 
valid. See go.nature.com/ 
vgug27 for more. 


Volkswagen sued 
The US Environmental 
Protection Agency (EPA) 

is taking Volkswagen to 

court after revelations that 
the company fitted vehicles 
with devices that circumvent 
emissions regulations. 
Volkswagen has admitted 
using such ‘defeat devices’ 
and has apologized for 

fitting them to some models 
of Volkswagen, Audi and 
Porsche cars. On 4 January, the 
Department of Justice, acting 
on behalf of the EPA, said that 
nearly 600,000 vehicles sold 
in the United States had used 


the illegal devices, causing 
harmful air pollution and 
violating the US Clean Air Act. 


Guinea Ebola-free 
Ebola virus is no longer 
spreading in Guinea, the 
World Health Organization 
(WHO) declared on 

29 December. The 
announcement came 42 days 
after the West African 
country’s last patient, a 
newborn, tested negative 

for the virus for a second 
time. Health officials will 
now watch closely for flare- 
ups of the deadly disease. 
Last November, a cluster of 
three Ebola cases emerged 

in Liberia, months after the 
WHO had announced the end 
of Ebola transmission there. 


Alfred Gilman dies 


in transmitting signals 

from the outside to the cell’s 
interior. Around 40% of 
pharmaceuticals act by binding 
to specific receptors that are 
coupled to G proteins. Gilman 
was an editor of the textbook 
The Pharmacological Basis 

of Therapeutics, originally 
co-written by his father. 


Fellowship refusal 
The American Association for 
the Advancement of Science 


Pharmacologist Alfred said on 22 December that it 
Goodman Gilman will not award an honorary 
(pictured), who shared fellowship to chemist Patrick 
the 1994 Nobel Prize in Harran, who was prosecuted 
Physiology or Medicine for for the accidental death 

his discovery of G proteins, of 23-year-old researcher 
died on 23 December, aged 74. Sheharbano Sangji in his 

G proteins are attached to lab in 2009. Last November, 
the internal surface of a cell’s Harran was named as one 


membrane and are involved 


of 347 scientists elected to 


HINT OF NEW BOSON SPARKS FLOOD OF PAPERS 


TREND WATCH 


Theoretical physicists are rapidly 
churning out papers as they rush 
to analyse tantalizing hints of a 
new particle — a boson — in data 
from the Large Hadron Collider 
(LHC). Experimental results 
announced on 15 December at 
CERN, which hosts the LHC near 
Geneva, Switzerland, sparked 

a flood of papers posted on the 
preprint server arXiv — 150 

had been published as Nature 
went to press — even though 

the statistical significance of the 
findings is low. See go.nature. 
com/eqmchr for more. 


In just 21 days, physicists have posted 150 papers on the arXiv 
preprint server about tantalizing results at the Large Hadron Collider. 


Number of papers submitted 
to arXiv (cumulative) 


22 Dec 


27 Dec 1 Jan 


2016 


receive the honour. But the 
organization subsequently 
learnt of the death, froma 
chemical fire in Harran’s lab at 
the University of California, 
Los Angeles, and reconsidered 
his nomination. See go.nature. 
com/jsujun for more. 


FACILITIES 


Moon base no more 


Russia’s plans to build a Moon 
base are on hold, according 

to 29 December reports. The 
Russian newspaper Izvestia 
says that a draft revised 
programme to 2025, developed 
by the country’s space agency 
Roscosmos, no longer includes 
plans to create a lunar base, 
along-held goal. The agency 
proposes cutting the budget 
for human Moon missions 

by 20%, or 88.5 billion 

roubles (US$1.2 billion), the 
newspaper adds. Roscosmos 
told the news agency Reuters 
that it was revising the scale of 
its programme, but declined to 
comment on the figures. 


| BUSINESS 
India biotech boost 


India has launched a 

strategy for biotechnology 
development for 2015-20, 
aiming to increase its biotech 
turnover from US$7 billion 

to $100 billion by 2025. 

The country will invest in a 
new generation of biotech 
products, create infrastructure 
for research and development 
and commercialization, and 
establish India as a major 
biomanufacturing hub, science 
and technology minister 
Harsh Vardhan announced on 
30 December. Plans include a 
network of biotech incubators, 
technology-development 
centres and 150 technology- 
transfer organizations. 


> NATURE.COM 
For daily news updates see: 
WwW.nature.com/news 
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eradication slowed by 


Guinea-worm 
epidemic in dogs p.10 


studied for first time in 


Antarctic clouds 
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twelve months p.14 


The science to look 
out for in the coming 


Flocking birds, 
swarming molecules and 
the mathematics of life p.16 
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Precision medicine uses genomic and physiological data to tailor treatments to individuals. 


HEALTH CARE 


China embraces 
precision therapy 


Strong genomics record bodes well for health-care revolution. 


BY DAVID CYRANOSKI 


ing, access to millions of patients and the 
promise of solid governmental support: 
those are the assets that China hopes to bring 
to the nascent field of precision medicine, 
which uses genomic, physiological and other 
data to tailor treatments to individuals. 
Almost exactly one year after US Presi- 
dent Barack Obama announced the Precision 


aes capacity in genome sequenc- 


Medicine Initiative, China is finalizing plans 
for its own, much larger project. But as uni- 
versities and sequencing companies line up to 
gather and analyse the data, some observers 
worry that problems with the nation’s health- 
care infrastructure — in particular a dearth of 
doctors — threaten the effort’s ultimate goal of 
improving patient care. 

Precision medicine harnesses huge amounts 
of clinical data, from genome sequences to 
health records, to determine how drugs affect 
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people in different ways. By enabling physicians 
to target drugs only to those who will benefit, 
such knowledge can cut waste, improve health 
outcomes using existing treatments, and inform 
drug development. For example, it is now clear 
that individuals with a certain mutation (which 
is mostly found in Asian people) respond bet- 
ter to the lung-cancer drug Tarceva (erlotinib; 
W. Pao et al. Proc. Natl Acad. Sci. USA 101, 
13306-13311; 2004), and the discovery of a 
mutation that causes 4% of US cystic fibro- 
sis cases led to the development of the drug 
Kalydeco (ivacaftor). 

The Chinese government is expected to offi- 
cially announce the initiative after it approves 
its next five-year plan in March. Just how much 
the effort will cost is unclear — but it will 
almost certainly be larger and more expensive 
than the US$215-million US initiative. 

Since last spring, Chinese media has been 
abuzz with estimates of a 60-billion yuan 
(US$9.2-billion) budget, spread over 15 years. 
But this figure is not finalized, cautions Zhan 
Qimin, director of the State Key Laboratory of 
Molecular Oncology at Peking Union Medi- 
cal College in Beijing, who is involved in the 
initiative. He says that the effort will consist 
of hundreds of separate projects to sequence 
genomes and gather clinical data, with support 
for each ranging from tens of millions of yuan 
to more than 100 million yuan. 

Anticipating the initiative, leading institutes 
— including Tsinghua University, Fudan Uni- 
versity and the Chinese Academy of Medical 
Sciences — are scrambling to set up precision- 
medicine centres. Sichuan University’s West 
China Hospital, for instance, plans to sequence 
1 million human genomes itself — the same 
goal as the entire US initiative. The hospital will 
focus on ten diseases, starting with lung cancer. 

Both the US and the Chinese efforts will focus 
on genetic links to diseases that are particularly 
deadly, such as cancer and heart disease. But 
China will target specific cancers, such as stom- 
ach and liver cancer, which are common there. 

The Chinese initiative is part of a series of 
research-funding efforts that will replace two 
major grant programmes, known as 863 and 
973, that are due to be phased out by 2017. The 
new programmes will be “more organized, 
more efficient’, says Zhan. 

Genome-sequencing companies are already 
vying to provide services to deal with the antici- 
pated demand. For several years, China has 
boasted high genome-sequencing capacity. 
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> In 2010, the genomics institute BGI in 
Shenzhen was estimated to host more sequenc- 
ing capacity than the entire United States. This 
was thanks to its equipment, purchased from 
Illumina of San Diego, California, which at the 
time represented state-of-the-art technology. 
But Illumina has since sold upgraded machines 
to at least three other genomics firms — WuXi 
PharmaTech and Cloud Health, both in Shang- 
hai, and the Beijing-based firm Novogene. 

Jason Gang Jin, co-founder and chief execu- 
tive of Cloud Health, says that this trio, rather 
than BGI, will be the main sequencing sup- 
port for China's precision-medicine initiative 
— although BGI’s director of research, Xu Xun, 
disagrees. Xu says that precision medicine is a 
priority for BGI and that the organization has 
a diverse portfolio of sequencers that still gives 
it an edge. “If you are talking about real data 
output, BGI is still leading in China, maybe 
even globally,” he says. BGI has already estab- 
lished a collaboration with the Zhongshan 
Hospital’s Center for Clinical Precision Medi- 
cine in Shanghai, which opened in May 2015 
with a budget of 100 million yuan and is run 
by Fudan University. 


NUMBERS GAME 

Regardless of the details, Jin thinks that China 
will be faster than the United States at sequenc- 
ing genomes and identifying mutations that 
are relevant to personalized medicine because 
China’s larger populations of patients for each 
disease will make it easier to find sufficient 
numbers to study. 

Still, it remains to be seen whether China 
has the resources to apply these insights to the 
individualized care of patients. “China wants 
to do it, and everybody is very excited,” says Ta 
Jen Liu, project director at the MD Anderson 
Cancer Center in Houston, Texas, who helps to 
establish collaborations in China and is famil- 
iar with the precision-medicine scene there. 

But there are hurdles. He notes that Chinese 
researchers and pharmaceutical companies 
have not had much success in developing 
drugs so far; that the pathologists needed to 
diagnose specific diseases are scarce in China; 
and that physicians there are notoriously over- 
worked. “Doctors are always overwhelmed 
with patients, seeing 60 or 70 a day,” he says. 
“They don't have time to sit down and think 
about what is best for specific patients” 

David Weitz, a physicist at Harvard 
University who is starting a company in 
Beijing to develop diagnostic instruments for 
use in precision medicine, agrees that there 
will be obstacles, but notes the initiative'’s assets. 
“We need lots of data to validate ideas, to vali- 
date tests,’ he says. “There's lots of data here” 

He thinks that this, combined with the 
Chinese government's determination to suc- 
ceed, will mean that the effort will ultimately 
win out. “They really seem devoted to meeting 
the needs of the society,’ he says. “It’s an exciting 
thing, to try to help that many people.” m 
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Most cases of Guinea-worm disease in Chad have occurred in communities based along the Chari River. 


INFECTIOUS DISEASE 


Dogs thwart end 
to Guinea worm 


Epidemic in dogs complicates push to wipe out parasite. 


BY EWEN CALLAWAY 


decades-long push to make Guinea- 
Aw disease the first parasitic infec- 

tion to be wiped out is close to victory. 
But a mysterious epidemic of the parasite in 
dogs threatens to foil the eradication effort. 

“Tf we're going to be aggressive and achieve 
this, we have to eliminate the infection in 
dogs,” says David Molyneaux, a parasi- 
tologist at the Liverpool School of Tropical 
Medicine, UK. 

The Carter Center in Atlanta, Georgia, 
is leading the global campaign to eradicate 
Guinea worm. Next week, it will announce 
that case numbers for the excruciatingly pain- 
ful infection are at a record low, with approx- 
imately 25 cases reported in 2015 in just 
4 countries: Chad, Ethiopia, Mali and South 
Sudan. But infections in dogs are soaring in 
Chad, where officials will meet at the end of 
January to grapple with the canine epidemic. 
The central African nation recorded more 
than 450 cases of Guinea worm in domestic 
dogs last year — an all-time high (see ‘Canine 
comeback). 

Researchers and officials strongly suspect 
that dogs are spreading the infection to 
humans; now the race is on to understand 
how this might happen, as well as how dogs 
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acquire the infection in the first place. The 
World Health Organization is unlikely to 
declare Guinea worm eradicated until the 
parasite has stopped spreading in dogs, says 
Molyneaux, who is part of the commission 
that will make that decision. 

In 1986, when the Carter Centre joined 
the Guinea-worm eradication campaign, 
there were an estimated 3.5 million infections 
annually, mostly due to poor sanitation and 
lack of access to clean water. 

When people drink unfiltered water, they 
can swallow microscopic freshwater crusta- 
ceans called copepods, which Guinea-worm 
larvae infect. The copepods die, releas- 
ing the larvae, which mature and mate in 
the human intestine. Male worms die after 
mating, but adult females — approximately 
80 centimetres in length — survive and slowly 
migrate out of the gut. About a year after 
infection, they burrow through their host's 
skin, usually around the legs and feet, some- 
times taking weeks to fully escape. To cope 
with the searing pain, many people bathe in 
rivers and lakes, contaminating the water with 
the next generation of larvae. Although rarely 
fatal, Guinea worm can debilitate people for 
months and keep children out of school. 

There is no vaccine against the parasite 
and no effective treatment, so eradication 
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efforts have focused on providing clean water 
and changing people’s behaviour, says Donald 
Hopkins, a special adviser at the Carter Center 
who is leading its Guinea-worm eradication 
efforts. People in areas in which the parasite 
was once rife have learnt to filter their water 
using cloths and to avoid re-contaminating 
water supplies. Even the most out-of-the-way 
villages now quickly contain cases and report 
them to health officials. 

Chad was on the cusp of being declared free 
of Guinea worm in the late 2000s: no case had 
been recorded in the previous decade. But start- 
ing in April 2010, increased surveillance turned 
up a handful of human infections, and around 
60 cases have been recorded since then. 

The cases are unusually sporadic and isolated 
from one another, says Mark Eberhard, a para- 
sitologist who consults on Guinea-worm erad- 
ication for the Carter Center. More typically, 
cases occur in clusters and recur in the same 
village year after year. “There was no increase or 
explosion of cases as one would expect,” he says. 

Shortly after these observations, officials 
began to hear rumours of Guinea-worm- 
infected dogs in Chad. Researchers have 
known for decades that dogs, leopards and 
other mammals occasionally acquire Guinea- 
worm-like infections, but they assumed that 
these cases stemmed from distinct species of 
Dracunculus, the nematode worm that causes 
the disease, or were rare examples of infections 


CANINE COMEBACK 


The number of Guinea-worm cases in dogs is soaring in Chad — a development that 


threatens global efforts to eradicate the parasite. 
@ Humans 
l™ Dogs 
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that had somehow spilt over from an outbreak 
in humans. 

But in Chad, researchers now think that dogs 
are spreading the worms to humans — not the 
other way around. Between January and Octo- 
ber 2015, officials recorded 459 canine infec- 
tions from 150 villages in the central African 
nation — an unprecedented volume. And 
genome sequencing has confirmed that dogs in 
Chad are infected by the same nematode worms 
(Dracunculus medinensis) that plague humans 
(M. L. Eberhard et al. 


Am. J. Trop. Med. Hyg. “If we’re going 
90, 61-70; 2014). to be aggressive 

To better under- and achieve 
stand the situation, this, we have 
ateam led by James tg eliminate 
Cottonand Caroline the infectionin 
Durrant, genome dogs.” 


scientists at the Well- 
come Trust Sanger Institute in Hinxton, UK, is 
now sequencing the genomes of more Guinea 
worms collected from dogs and humans in 
Chad to confirm that dogs are indeed transmit- 
ting the disease to people. And Eberhard, who is 
convinced that this is the case, is trying to deter- 
mine how dogs become infected in the first 
place. They are unlikely to contract the worms 
from drinking water, he says, because dogs tend 
to scare away copepods when they lap. Most of 
Chad's cases have occurred among fishing com- 
munities along the Chari River, and Eberhard 


Cases tend to spike in the 
summer, during annual 
freshwater-fish harvests 
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suspects that dogs are eating the entrails of 
gutted, copepod-eating fish. Dogs then pass the 
worms to humans by reintroducing the larvae 
into water. 

Researchers, including Eberhard, are testing 
aspects of this hypothesis in ferrets, a common 
animal model in disease research, but eradica- 
tion officials in Chad are not waiting for the 
results before taking action. Since February 
2015, they have offered the equivalent of US$20 
to people who report Guinea-worm cases in 
dogs and tie up the animals to prevent them 
from contaminating water sources. They are 
also encouraging villagers to bury fish entrails 
to keep dogs from eating them. And a trial is 
ongoing to test whether a drug used to treat 
heartworm — a roundworm parasite com- 
mon in dogs — has any effect on Guinea worm. 
Because of Guinea worm’s one-year incubation 
time, it should be clear before the end of 2016 
whether these interventions have worked. 

Older residents from villages along the Chari 
River say that their fishing practices have not 
changed, according to Hopkins, and they can- 
not recall dogs becoming infected with Guinea 
worm in the past. But Molyneaux says that the 
dearth of humans transmitting the disease 
could explain the parasite’s jump to dogs. “If 
you were Guinea worm and there were only 100 
of you left in the world,’ he says, “what would 
you do? Youd get the hell out of the host that’s 
being targeted and move to something else.” m 
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Clouds in Antarctica can influence weather the world over. 


CLIMATE SCIENCE 


Antarctic clou 
study takes of 


Scientists probe atmospheric physics above ice sheet for the 


first time since the 1960s. 


BY ALEXANDRA WITZE 


n Antarctica’s Ross Island, a short 
() drive from the US McMurdo research 

station, high-tech radar antennas and 
other atmospheric instruments gaze skyward, 
gathering detailed measurements of West 
Antarctic clouds. Remarkably, these are the 
first such data to be gathered in five decades 
— even though weather patterns in the region 
can influence those half a world away. 

The US$5-million project, known as the 
Atmospheric Radiation Measurement West 
Antarctic Radiation Experiment (AWARE), 
began to observe the skies near McMurdo in 
November and will run until early 2017. A 
second measurement station, 1,600 kilometres 
away in the ice sheet’s interior, will operate 
until the end of this month. (The site is so 
remote that it can be used only during the 
Antarctic summer.) 

A similar experiment in the Arctic in 
1997-98 relied on an instrument-laden ship 
that was deliberately frozen into sea ice. It 
yielded fundamental insights into the physics 
of northern polar clouds’, and AWARE scien- 
tists hope that their project will do the same 
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for the south. “This is going to be a sea change 
in our understanding,” says Lynn Russell, an 
atmospheric scientist at the Scripps Institution 
of Oceanography in La Jolla, California, and a 

co-principal investigator on AWARE. 
Antarctica’s massive ice sheet acts as a 
global heat sink. As a result, changes in 
Antarctic clouds, such as the amount of 
ground they cover or 


“This is goin how much radiation 
tobea a J they absorb, can have 
change in our ripple effects as far 


away as the tropics. 
Climate modellers 
need to understand 
the physics of these clouds if they are to cor- 
rectly work out how weather around the globe 
will change as the polar regions warm. 

Scientists have not made detailed, in-place 
measurements of the skies above West Ant- 
arctica since 1967, when weather-balloon 
launches ceased a decade after they began 
during the 1957-58 International Geophysi- 
cal Year, says Russell. 

AWARE, which is led by Scripps atmos- 
pheric scientist Dan Lubin, aims to get the best 
data yet on clouds and aerosol particles above 


understanding.” 
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West Antarctica. That includes mixed-phase 
clouds, which occur in polar regions and com- 
bine supercooled water with ice. Studies have 
shown that clouds moving across Antarctica’s 
interior are mostly ice, whereas those moving 
onshore from the coast contain more liquid 
water’. The composition of these clouds plays 
a major part in determining how much sun- 
light they reflect into space — which helps to 
shape atmospheric circulation and weather 
patterns below. 

Satellites such as NASA's CloudSat and 
CALIPSO (Cloud-Aerosol Lidar and Infrared 
Pathfinder Satellite Observations) can probe 
the internal structure of Antarctic clouds’, 
but in only a narrow ribbon as seen directly 
beneath the spacecraft’s orbit. AWARE uses 
multiple radar instruments and a sophisticated 
lidar system to explore the clouds’ many lay- 
ers, examining properties such as phase and 
particle size at various altitudes. 

Early AWARE data show mixed-phase 
clouds over McMurdo, in the first detailed 
measurements of such cloud systems outside 
the Arctic. “The Antarctic is a very different 
environment than the Arctic, because it is 
colder year-round and also has a very pris- 
tine atmosphere,” says Lubin. Team scientists 
reported early results on 16 December at the 
American Geophysical Union meeting in San 
Francisco, California. 

The team has also clocked pulses of 
humidity swinging in and out of the McMurdo 
area as a storm passed through, altering how 
the clouds transmit radiation. 

Getting the basic data should help 
scientists to better understand how Antarc- 
tic clouds will respond to a changing climate, 
Russell says. West Antarctica is warming by 
as much as 0.4°C per decade, and as its ice 
melts, sea levels will rise. The AWARE meas- 
urements from the West Antarctic interior 
are designed to capture the height of the 
summer melt season. 

One major question is how climate change 
may be intensifying westerly winds around 
Antarctica, and what those changes will 
do to southern polar clouds, says Andrew 
Vogelmann, an atmospheric scientist at 
Brookhaven National Laboratory in New 
York. With one AWARE location near the 
coast and another in the interior, project 
scientists aim to compare how atmospheric 
systems passing through West Antarctica 
affect both locations, and how those changes 
translate to wider global shifts. 

One final twist, Vogelmann adds, is the pres- 
ence this year of the El Nifio weather pattern, 
which could affect conditions at the poles. “We 
may be able to catch some of that; he says. m 
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IN FOCUS | NEWS 


Dutch lead European push 
to flip journals to open access 


Academic consortia urge faster changes in scholarly publishing. 


BY DECLAN BUTLER 


" | ‘a Netherlands is leading what it hopes 
will be a pan-European effort in 2016 
to push scholarly publishers towards 

open-access (OA) business models: making 

more papers free for all users as soon as they 
are published. 

In 2014, publishers worldwide made 17% 
of new papers OA immediately on publica- 
tion, up from 12% in 2011 (see ‘Growth of 
open access’). But most papers are still locked 
behind paywalls when they are first published. 
The Dutch government, which took over the 
six-month rotating presidency of the Euro- 
pean Union council of ministers this month, 
has declared furthering OA to be one of its top 
priorities. 

With strong support from Carlos Moedas, 
the EU’s research commissioner, it is planning 
a series of discussions on the issue — between 
European science ministers at the end of 
January (with a keynote talk from Bill Gates, 
whose philanthropic foundation strongly sup- 
ports OA) and at an EU presidency conference 
on open science in April. At that forum, the 
European Commission is expected to launch 
an ‘Open Science Policy Platforn’ with a remit 
that includes investigating how subscription 
publishers can best transition to OA. 

The Association of Universities in the Neth- 
erlands (VSNU), a consortium of 14 institutes, 
has already taken radical steps. With backing 
from the Dutch government, it has negotiated 
several deals with major publishers over the past 
two years to make more Dutch papers open in 
subscription journals, with the aim of shifting 
the journals to an OA business model. The deals 
are a “great step forward to an OA world’, says 
Paul Ayris, head of library services at Univer- 
sity College London and a spokesperson for 
the League of European Research Universities, 
which has urged the commission and the Dutch 
presidency to speed the OA transition. 


OPEN-ACCESS DEALS 

In 2014, the VSNU announced a deal in which 
it renewed its subscription to a bundle of 2,000 
paywalled journals from the publisher Springer, 
but with terms that made papers by corre- 
sponding authors at subscribing Dutch uni- 
versities OA, for no extra charge. (Springer has 
since merged with Nature’s publisher.) Shortly 
before Christmas 2015, the VSNU announced 


a similar agreement with Elsevier, which 
the consortium had threatened to boycott if 
its demands were not met: by 2018, 30% of 
Dutch papers will be OA in VSNU-subscribed 
Elsevier journals. 

The hope, Ayris says, is that if other nations’ 
organizations can make similar deals, publishers 
will be compelled to release more open papers 
in return for their flow of subscription income, 
effectively flipping their journals to become 
fully OA. OA journals receive no subscription 
income and instead make money either by direct 
subsidy or by charging authors (or their research 
funders) a fee to publish each OA paper. 


GROWTH OF OPEN ACCESS 


The worldwide share of papers that journals make 
open immediately on publication is rising slowly. 
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The United Kingdom has gone down the 
same track. In October 2015, Jisc, a non-profit 
body that represents UK higher-education 
institutions, negotiated a deal that made OA 
papers with UK-based corresponding authors 
free in 1,600 selected Springer subscription 
journals. A spokesperson for Springer says 
that the agreements are pilots, but that “deals 
which combine subscriptions with OA pub- 
lishing could accelerate the transition to OA 
ona large scale”. 


HYBRID CRACKDOWN 

A major driving force for the Dutch and Brit- 
ish deals was to combat the expensive and 
controversial ‘hybrid’ business models that 
have been adopted by many subscription 
journals worldwide. Hybrid journals col- 
lect subscriptions but allow authors to make 
individual papers open for a fee. They charge 
higher fees, on average, than do fully OA jour- 
nals, yet scientists who want OA papers often 
choose to publish with them because they are 
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generally more established or prestigious than 
many recently launched OA journals. 

Robert Kiley, who is head of digital services 
at the library of the Wellcome Trust, the Lon- 
don-based biomedical funder, notes that many 
UK organizations have each paid millions of 
pounds to hybrid journals for open papers — 
while also paying them subscriptions. A deal 
akin to the VSNU’s one with Springer would 
help to bypass this hybrid market. 

But these kinds of deals have their critics. 
The costs of the agreements are confidential, 
points out Mark McCabe, an economist at the 
University of Michigan in Ann Arbor; he sur- 
mises that they did not come cheap. He and 
others say that such secretive deals risk locking 
academic institutions into continuing to pay 
expensive fees to major subscription publish- 
ers, and they shield the latter from competition. 

McCabe proposes a more radical strategy: 
libraries or university consortia should stop 
paying journal subscriptions and should trans- 
fer the money saved to their researchers, who 
can use it to publish OA in journals of their 
choice. That way, authors might become more 
sensitive to the price of publishing — which 
might lead to greater competition between 
journals, promoting leaner-run OA journals 
that charge lower fees. 

Some funders are trying other ideas to 
support OA but steer researchers away from 
the hybrid market. The Norwegian Research 
Council and the German Research Foundation 
both pay OA fees for researchers but prohibit 
them from being spent on articles in hybrid 
journals. And the Austrian Science Fund 
has capped OA payments at a certain level; if 
researchers want to publish in more expensive 
journals (often the hybrids), they must find the 
extra cash themselves. 

But measures to change industry business 
models will succeed only with international 
buy-in. And some other nations, such as the 
United States, have not followed the Neth- 
erlands in urging the publishing industry to 
make more papers immediately OA. They have 
favoured other routes to free-to-read papers, 
such as encouraging academics to archive their 
pre-publication manuscripts online, and man- 
dating subscription publishers to make papers 
free after a delay (typically six months or a year 
after publication). A successful push for imme- 
diate OA, Kiley says, would ultimately need to 
be global — not limited to Europe. = 
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What to look out for in 2016 


Space missions, carbon capture and gravitational waves are set to shape the year. 


SUCKING UP CO, 

A Swiss company is set to become the first firm 
to capture carbon dioxide from the air and 
sell it on a commercial scale, a stepping stone 
to larger facilities that could one day help to 
combat global warming. Around July, Clime- 
works will start capturing some 75 tonnes of 
CO, per month at its plant near Zurich, then 
selling the gas to nearby greenhouses to boost 
crop growth. Another company — Carbon 
Engineering in Calgary, Canada, which has 
been capturing CO, since October but is yet 
to bring it to market — hopes to show that it 
can convert the gas into liquid fuel. Facilities 
worldwide already capture the gas from power- 
plant exhausts, but until 2015 only small dem- 
onstration projects sucked it up from air. 


CUT-AND-PASTE GENES 

Human trials will get under way for treat- 
ments that use DNA-editing technologies. 
Sangamo Biosciences in Richmond, Cali- 
fornia, will test the use of enzymes called 
zinc-finger nucleases to correct a gene 
defect that causes haemophilia. Working 
with Biogen of Cambridge, Massachusetts, 
it will also start a trial to look at whether 
the technique can boost a functional form of 
haemoglobin in people with the blood disor- 
der B-thalassaemia. Scientists and ethicists 
hope to agree on broad safety and ethical 
guidelines for gene editing in humans in 
late 2016. And this year could see the birth 
of the first gene-edited monkeys that show 


symptoms of the human disorders they are 
designed to model. 


HIGH COSMIC HOPES 

Physicists think there is a good chance that 
they will see the first evidence of gravita- 
tional waves — ripples in space-time caused 
by dense, moving objects such as spiralling 
neutron stars — thanks to the Advanced 
Laser Interferometer Gravitational-Wave 
Observatory (Advanced LIGO). And Japan 
will launch Astro-H, a next-generation X-ray 
satellite observatory that, among other things, 
could confirm or refute the claim that heavy 
neutrinos give off dark-matter signals known 
as bulbulons. Hints ofa potential new particle 
from the supercharged Large Hadron Collider 
(LHC), which has been running at record ener- 
gies since last June, could become clearer as 
the machine rapidly accumulates data. Even if 
the particle is not confirmed, the LHC could 
still unearth other exotic phenomena, such as 
glueballs: particles made entirely of the carriers 
of the strong nuclear force. 


RISKY RESEARCH 

Scientists will soon hear whether funding for 
research that makes viruses more dangerous 
can resume. In October 2014, the US govern- 
ment abruptly suspended financial support 
for ‘gain-of-function’ studies. These experi- 
ments could increase understanding of how 
certain pathogens evolve and how they can 
be destroyed, but critics say that the work 


The light-driven spacecraft LightSail will undergo a test mission in April. 
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also boosts the risk of, for example, acciden- 
tal release of deadly viruses. A risk—benefit 
analysis was completed in December 2015, 
and the US National Science Advisory Board 
for Biosecurity will issue recommendations 
in the next few months on whether to resume 
funding — potentially with tightened restric- 
tions on the research. 


TO MARS AND BEYOND 

The orbits of Earth and Mars will bring the 
planets close to each other this year, creating 
the perfect opportunity for a trip to the red 
planet. A joint mission between the Euro- 
pean Space Agency (ESA) and Roscosmos 
will capitalize on that chance. Launching in 
March, ExoMars 2016 will analyse gases in 
Mars’s atmosphere and test landing technol- 
ogy. Farther afield, NASA‘s Juno mission will 
arrive at Jupiter in July. In September, ESA’s 
craft Rosetta will make a death dive into the 
comet it orbits; mourners can console them- 
selves with the launch of NASA’s OSIRIS-REx, 
a mission to bring back samples from the 
asteroid Bennu. 


COMMERCIAL GAINS 

One lucky research group will win a $50-mil- 
lion grant for heart-disease research from 
Internet giant Google and the American Heart 
Association. Google’s disease-research port- 
folio is growing, and neuroscientists are eager 
to see what Thomas Insel, former director of 
the US National Institute of Mental Health, 
will do at the firm, where he has been leading 
a mental-health effort since November. Private 
funding could also make its mark in space: the 
non-profit Planetary Society in Pasadena, 
California, plans to launch a US$4.5-million 
mission in April to test its light-driven space- 
craft, LightSail. 


SPACE DRIVE 

Hot on the heels of the launch of the US$100- 
million Dark Matter Particle Explorer 
(DAMPE) last December, China’s National 
Space Science Center will launch the second 
and third space-science probes in its planned 
series of five. The world’s first quantum com- 
munications test satellite will blast off in June, 
and the Hard X-ray Modulation Telescope — 
which will scour the sky for energetic sources 
of radiation, such as black holes and neutron 
stars — will fly by the end of the year. Septem- 
ber will see China complete construction of 
the 500-meter Aperture Spherical Radio Tele- 
scope (FAST), which will supersede Puerto 
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Ricos Arecibo Observatory as the 
world’s largest radio telescope. In 
Hawaii, the team behind the con- 
troversial Thirty Meter Telescope, 
which had its construction permit 
revoked in December, will try to 
work out whether and how it can 
move the project forward. 
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MICROLIFE REVEALED 

“ The first results from an ambi- 
” tious project to analyse the 
world’s microbial communi- 
ties are expected this year. The 
Earth Microbiome Project, 
which launched in 2010, aims to 
sequence and characterize at least 
200,000 samples of microbial 
DNA taken from everything from 
Komodo dragon tongues to soil in 
the Siberian tundra. The project 


promises to uncover unprecedented levels of 


biological diversity. 


POLITICAL UPHEAVAL 

In November, the United States will elect 
a new president. If a Republican takes the 
White House, long-debated plans to bury 
nuclear waste at Yucca Mountain in Nevada 
may well resurface, and federal funding for 
climate and social science could face the chop. 


Komodo dragon saliva has been sampled for the Earth Microbiome Project. 


And if Canada’s Liberal government lives up to 
its pre-election promises, the country will get 
a chief science officer, who researchers trust 
will arrive with a drive to rebuild the depleted 
ranks of government scientists. 


DREAM GENES 

Neuroscientists hope to finally identify genes 
that are crucial to regulating the timing and 
duration of sleep but have been difficult to 


tease out, possibly because they 
also have other functions in the 
brain. Pinpointing these genes 
could shed light on sleep dis- 
orders and some psychiatric 
illnesses, which scientists now 
realize are linked to highly dis- 
rupted sleep patterns. 


LET THERE BE LIGHT 

The SESAME (Synchrotron- 
light for Experimental Science 
and Applications in the Mid- 
dle East) facility will switch 
on in Jordan towards the end 
of 2016. The ring-shaped par- 
ticle accelerator will generate 
intense light to probe materials 
and biological structures down 
to the atomic level. It is the 
region’s first major international 
research facility, and a rare collaboration 
between governments including Iran, Israel 
and the Palestinian Authority. Support to 
build a similar facility in Africa is likely to 
gather pace. And in June, scientists will get 
to use bright X-ray beams at the world’s first 
fourth-generation synchrotron, MAX IV in 
Lund, Sweden. m= 
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physics 
of life 


irst, Zvonimir Dogic and his students 
took microtubules — threadlike pro- 
teins that make up part of the cell’s 
internal ‘cytoskeleton — and mixed 
them with kinesins, motor proteins that travel 
along these threads like trains on a track. Then 
the researchers suspended droplets of this 
cocktail in oil and supplied it with the molecular 
fuel known as adenosine triphosphate (ATP). 

To the team’s surprise and delight, the 
molecules organized themselves into large-scale 
patterns that swirled on each droplet’s surface. 
Bundles of microtubules linked by the proteins 
moved together “like a person crowd-surfing at 
a concert’, says Dogic, a physicist at Brandeis 
University in Waltham, Massachusetts. 

With these experiments, published’ in 2012, 
Dogic’s team created a new kind of liquid crys- 
tal. Unlike the molecules in standard liquid- 
crystal displays, which passively form patterns 
in response to electric fields, Dogic’s compo- 
nents were active. They propelled themselves, 
taking energy from their environment — in this 
case, from ATP. And they formed patterns spon- 
taneously, thanks to the collective behaviour of 
thousands of units moving independently. 

These are the hallmarks of systems that 
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From flocking birds to swarming molecules, 
physicists are seeking to understand “active 
matter’ — and looking for a fundamental 


theory of the living world. 


BY GABRIEL POPKIN 


physicists call active matter, which have become 
a major subject of research in the past few years. 
Examples abound in the natural world — 
among them the leaderless but coherent flock- 
ing of birds and the flowing, structure-forming 
cytoskeletons of cells. They are increasingly 
being made in the laboratory: investigators have 
synthesized active matter using both biologi- 
cal building blocks such as microtubules, and 
synthetic components including micrometre- 
scale, light-sensitive plastic ‘swimmers’ that 
form structures when someone turns on a lamp. 
Production of peer-reviewed papers with ‘active 
matter’ in the title or abstract has increased from 
less than 10 per year a decade ago to almost 70 
last year, and several international workshops 
have been held on the topic in the past year. 


THE SECRET OF LIFE 

Researchers hope that this work will lead 
them to a complete, quantitative theory of 
active matter. Such a theory would build on 
physicists’ century-old theory of statistical 
mechanics, which explains how the motion 
of atoms and molecules gives rise to every- 
day phenomena such as heat, temperature 
and pressure. But it could go much further, 
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providing a mathematical framework for still- 
mysterious biological processes such as how 
cells move things around, how they create and 
maintain their shapes and how they divide. 
“We want a theory of the mechanics and sta- 
tistics of living matter with a status compara- 
ble to what’s already been done for collections 
of dead particles,’ says Sriram Ramaswamy, a 
physicist and director of the Tata Institute of 
Fundamental Research's Centre for Interdisci- 
plinary Sciences in Hyderabad, India. 

It could be a while before that want is 
satisfied, however. Experimentalists are only 
beginning to gain control of active materials in 
the lab. Even the most enthusiastic proponents 
of this research admit that no one has yet pro- 
duced a theory of active matter that describes 
the behaviour of everything from cell parts to 
birds. And if such a theory did exist, it’s far from 
certain that mainstream biologists would see 
value in it. For biologists, the idea that living 
matter is active “would be just so obvious as to 
not really contain very much information’, says 
Jonathon Howard, a molecular biophysicist at 
Yale University in New Haven, Connecticut. 

But that has not kept proponents from imag- 
ining applications such as self-assembling 
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artificial tissue, self- 
pumping microfluidic 
devices and new bio- 
inspired materials — 
although researchers admit that such ideas are 
still far from being realized. “I think it’s too early 
for the field to have an application, because we're 
still kind ofastonished at what can happen,’ says 
Andreas Bausch, a physicist at the Technical 
University of Munich in Germany — “but I do 
think the field needs somebody doing it” 


Flocking birds can 
synchronize to 
make patterns. 


ALL TOGETHER NOW 

All known life forms are based on self-propelled 
entities uniting to create large-scale structures 
and movements. If this didn’t happen, organ- 
isms would be limited to using much slower, 
passive processes such as diffusion to move 
DNA and proteins around inside cells or tissues, 
and many of life's complex structures and func- 
tions might never have evolved. Biologists and 
physicists have speculated for decades about the 
general principles of living matter, but research 
on cellular processes has focused on identify- 
ing the dizzying array of molecules involved, 
rather than on working out the principles by 
which they self-organize. As a result, what is 
now known as active-matter research did not 
really get under way until the mid-1990s. 

One of the most influential early experiments 
was conducted by the team of Stanislas Leibler, a 
biophysicist who was then at Princeton Univer- 
sity in New Jersey and is now at the Rockefeller 
University in New York. The group was among 
the first to show that complex, life-like struc- 
tures could self-assemble from microtubules 
and a few proteins supplied with ATP”. Around 
the same time, an influential model of active 
matter was being developed by Tamas Vicsek, a 
theoretical biophysicist at Eotvés Lorand Uni- 
versity in Budapest. In the early 1990s, Vicsek 
was trying to account for the collective motions 
of bird flocks, bacterial colonies and cytoskele- 
ton components when he realized that no exist- 
ing theory would work. “Tt’s not like equilibrium 
statistical mechanics, where you take a book 
and you find what to do,” says physicist Jean- 
Francois Joanny of the Curie Institute in Paris. 

Instead, Vicsek found a starting point in 
a model of magnetic materials developed in 
1928 by German physicist Werner Heisenberg. 
Heisenberg imagined each atom as a freely 
rotating bar magnet, and found that large- 
scale magnetism emerges when interactions 
between these atomic magnets cause the 
majority of them to align. To explain active 
matter, Vicsek replaced the tiny magnets with 
moving ‘arrows’ symbolizing particles with 
velocities that aligned with the average veloc- 
ity of their neighbours — albeit with a certain 
amount of random error. That led to what is 
now known as Vicsek’s flocking model’. His 
simulations showed that when enough arrows 
were packed into a small enough space, they 
began to move in patterns that closely resem- 
bled the familiar movements of bird flocks and 


fish schools (see ‘Smart swarny). 

“T got excited,” recalls Vicsek, whose 1995 
paper’ on the model has received more than 
3,500 citations. “I was walking up and down 
the corridor and told people I had designed 
the moving version of the Heisenberg model.” 

One physicist attracted to this idea was John 
Toner, who heard Vicsek give a talk on it in 
1994. Toner, now at the University of Oregon 
in Eugene, saw that Vicsek’s swarming arrows 
could be modelled as a continuous fluid. He 
took the standard equations for hydrodynamics, 
which describe fluid flow in everything from tea 
kettles to oceans, and modified them to account 
for how individual particles use energy*. Toner’s 
fluid model and Vicsek’s discrete-particle model 
gave essentially the same predictions for a wide 
range of phenomena, and launched a cottage 
industry of active-matter simulations. 

There was only one problem. Whereas the 
number of simulations was skyrocketing, says 
physicist Denis Bartolo of the Ecole Normale 
Supérieure in Lyons, France, “the number of 
quantitative experiments was constant and 
very close to zero”. Practical work was chal- 
lenging: no one could hope to do controlled 
experiments with 10,000 real birds or fish. And 
at the microscopic scale, few scientists were 
familiar with both the necessary theoretical 
work — being published mainly in physics 
journals — and the biological lab techniques 
needed to purify cellular components. 


PRACTICAL MAGIC 
Only in the late 2000s did the theoretical and 
experimental pieces begin coming together. 
Bausch led one of the first precise, quanti- 
tative experiments. He and his colleagues 
mixed actin, a filament that forms most of the 
cytoskeleton of complex cells, with myosin, 
a molecular motor that ‘walks’ on actin and 
makes muscles contract. The researchers added 
myosin’ natural fuel, ATP, then put the mixture 
ona microscope slide and watched. “We didn't 
do anything; we just added the stuff? Bausch 
says. At low concentrations, the actin filaments 
swam around without recognizable order. But 
at higher densities, they formed pulsating 
clusters, swirls and bands. Bausch and his col- 
leagues immediately recognized and quantified 
phase transitions of the kind that Vicsek and 
others had predicted. Their 2010 paper helped 
to ignite the experimental active-matter field. 
Among the studies that followed were Dogic’s 
2012 microtubule experiments’, which used 
another walking protein, kinesin. The result- 
ing patterns were much more complex and 
dynamic than the ones Bausch saw: the flowing 
microtubules looked like fingerprint whorls in 
motion. Dogic and his team also noticed that 
the orderly alignment of this flow would occa- 
sionally break down and produce ‘defects’: 
discontinuities in the pattern that resemble con- 
verging longitude lines at the North and South 
poles. These defects were dynamic, moving 
around like self-propelled particles. 
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No theory at the time could account for this 
behaviour. But in 2014, Dogic teamed up with 
Bausch and physicist Cristina Marchetti of 
Syracuse University in New York to describe 
the behaviour of active liquid crystals swirling 
on spherical vesicles in terms of the movement 
of defects rather than of individual crystal ele- 
ments’. Furthermore, the group found that it 
could tune the defects’ motion by adjusting the 
vesicle’s diameter and surface tension, suggest- 
ing a possible way to control an active crystal. 

Dogic and his students are now trying to do 
just that. By studying the spontaneous flows of 
microtubules and proteins confined in small, 
doughnut-shaped containers, they hope to lay 
the groundwork for a self-pumping fluid that 
could move molecules around in microfluidic 
devices similar to those that are becoming 
increasingly common in experimental biology, 
medicine and industry. Active matter “changes 
our ideas of what materials can do’, says Dogic. 

But any industrial application will have to 
overcome at least one major roadblock. The bio- 
logical materials currently used in active-matter 
experiments are expensive and time-consuming 
to purify — Dogic’s microtubules come from 
cow brains, and 
Bausch uses actin 
from rabbit muscle 
— and they last 
only a short time 
in the lab. Until a 
cheap, robust, off- 
the-shelf source 
of active-matter 
materials can be 
found, commercial 
use is unlikely, says 
Bausch. 

Advances in synthetic active materials may 
show the way forward. In 2013, New York 
University physicist Paul Chaikin and his 
colleagues described making particles of haem- 
atite, an iron oxide mineral, inside a spherical 
polymer’. When the scientists placed these 
‘swimmers in a solution of hydrogen perox- 
ide and exposed them to blue light, a chemical 
reaction caused the particles to move around 
spontaneously, clumping and unclumping like 
groups of people at a cocktail party. 

In 2013, Bartolo and his colleagues reported 
large-scale flows using even simpler plas- 
tic spheres in a conducting fluid®. When the 
researchers turned on an electric field, the 
spheres began rotating in random directions. 
Athigh enough densities, interactions between 
nearby spheres caused them to spontaneously 
roll, flock-like, in the same direction. 

Such lab-made materials remain primi- 
tive, however, compared with those produced 
in cells by 4 billion years of evolution. Dogic 
notes that the kinesins he uses are much more 
efficient than any human-made motors at con- 
verting energy to motion. And Bartolo is also 
quick to discourage talk of short-term pay-offs. 
“Tm not targeting a specific application, he 


“¢ We’re 
still kind of 
astonished 
at what can 
happen. ?? 
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Smart swarms 


A simple model of interactions among 
self-propelled particles can realistically 
simulate the movement of flocks of 
birds, schools of fish, self-assembling 
proteins in the cell and many other 
forms of active matter. 


Low density: randomness 


When individuals have few neighbours 
to compare themselves to, they mill 
about with no obvious pattern. 
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Tasca in 


+ 


Individuals steer towards 
the average heading of 
their neighbours. 


Higher density: flocking 
As the density increases, the group’s 
motion becomes synchronized. 
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says of his rotating plastic spheres. 

Possible applications aside, active matter 
excites scientists because it so closely resem- 
bles the most complex self-organizing systems 
known: living organisms. In 2011, Dogic and 
his colleagues reported’ that microtubule 
bundles anchored at one end to air bubbles 
on a microscope slide beat in synchronized, 
wave-like patterns eerily reminiscent of the 
hair-like cilia and flagella that protrude from 
the surfaces of some cells. And in his 2012 
paper’, he noted a striking similarity between 
his microtubule flows and cytoplasmic stream- 
ing, a process in which cytoskeletal filaments 
team up to whisk a cell’s contents around like 
“a giant washing machine’, he says. 

The resemblance between lab-prepared 
active matter and living things can be uncanny, 
agrees Jennifer Ross, a physicist at the Univer- 
sity of Massachusetts Amherst. At talks, she 
has shown videos of spherical microtubule- 
kinesin systems and asked audience members 
whether they think they are seeing a real cell. 
“Whenever I present these to cell biologists in 
particular, they are always fooled,” she says. 

But something can look and act like a living 
organism without actually following the same 
rules, cautions Howard. He points out that 
Dogic’s group created something that looks 
and acts very much like a cilium or flagellum 
with its multitude of proteins — but that may, 
in fact, work very differently. “There’s some- 
thing in there about the underlying mecha- 
nism, but it’s extremely abstract,’ he says. 


IS IT ENOUGH? 
To probe whether active-matter theory can 
reveal biological mechanisms, Daniel Needle- 
man, a biophysicist at Harvard University in 
Cambridge, Massachusetts, studied the spindle: 
a microtubule-based structure that controls the 
separation of chromosomes during cell division. 
He wanted to test the idea, suggested by earlier 
theories and experiments, that short-range 
microtubule-kinesin interactions by themselves 
were sufficient to yield spindle-like structures. 
He first used sophisticated microscopes to 
examine extracts from frog egg cells, quan- 
tifying microtubule density, orientation and 
stresses during spindle formation. “It really was 
not clear at all until Dan came along that you 
could measure all these things,” says Howard. 
Needleman then merged his measurements 
with models of how active matter self-organizes. 
In 2014, he and Jan Brugués, a biologist at the 
Max Planck Institute of Molecular Cell Biology 
and Genetics in Dresden, Germany, reported 
that, consistent with theory, the interactions 
they observed among closely spaced micro- 
tubules are enough to produce the spindle and 
keep it stable’’. “People have argued that you 
need more complex processes,’ says Needle- 
man. “But the fact that one can understand so 
much of the spindle without invoking any of 
that shows that it’s certainly not necessary.’ 
Others are using ideas from active matter to 


© 2016 Macmillan Publishers Limited. All rights reserved 


probe how large numbers of cells organize in 
processes such as tissue growth, wound healing 
and the spread of tumours. Theorists including 
Marchetti, Joanny and Frank Jiilicher of the 
Max Planck Institute for the Physics of Com- 
plex Systems in Dresden have modelled tissues” 
and tumours’as flowing cells that self-organize 
through short-range cell-to-cell interactions 
rather than chemical signals. Experimentalists 
are testing such ideas, for instance, by showing 
that active-matter theory can help to describe 
cell organization in a developing fruit-fly wing”. 

Some biologists hope that such studies will 
reveal the fundamental principles that govern 
how cells divide, take shape or move. “It’s like 
Linnaean classification before Darwin came 
along,” says biologist Tony Hyman of the Max 
Planck Institute of Molecular Cell Biology and 
Genetics. “We've got all these molecules, just 
like they had all those species, and we need to 
put some kind of order, some kind of reason 
behind it all?” Active matter, Hyman thinks, 
could provide that reason. 

But even enthusiasts admit that mainstream 
biologists may need convincing. “We used to 
get a lot of papers rejected at the beginning,” 
says Hyman — in part because the manu- 
scripts heavy use of mathematics made it 
hard to find reviewers. Even the phrase ‘active 
matter’ may hinder communication, adds 
Howard. “It’s kind ofa physics-y term? 

Still, Howard and Hyman hope that accept- 
ance will be aided by increasing convergence 
between fields. Among biologists, says Hyman, 
“T think the new generation coming along will 
be trained in physics from the beginning” 

And that’s good, adds Stephan Grill, a 
biophysicist at the Technical University of 
Dresden, because progress in active matter 
calls for scientists who are at the cutting edge 
of both physics and biology. “The pot of gold is 
at the interface,” he says, “but you have to push 
both fields to their limits.” m 


Gabriel Popkin is a freelance writer in Mount 
Rainier, Maryland. 
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r | Vhe energy industry has long met 
demand by varying the rate at which 
it consumes fuel. Controlling the 

output of an oil-fired power plant is much 

like changing the speed of a car — press 
the accelerator pedal and more gas flows 
to the engine. 

But the wind cannot be turned up or 
down. Smart software can make wind 
farms more efficient and responsive. Com- 
puter models can predict wind speed and 
control the number and capacity of tur- 
bines in operation to meet energy demand. 
Low-vibration designs and health moni- 
toring would enable turbines to run more 
smoothly, avoiding expensive failures of 
gearboxes and other components whose 
replacement can cost hundreds of thou- 
sands dollars and take days. 

Optimizing renewables requires data: 
on device performance, energy output 
and weather predictions, seconds to days 
in advance. Vast quantities of informa- 
tion are collected by turbine manufactur- 
ers, operators and utility companies — yet 
hidden in their archives’. The information 
is prohibitively difficult for anyone outside 
to access. 

It took me two years of discussions 
with different energy companies and the 
signing of several non-disclosure docu- 
ments to obtain enough data to carry out 
a study on the performance of wind farms 
in Iowa, for instance. Wind-turbine data 
are usually recorded every 10 seconds and 
averaged over 10 minutes (see ‘Poor per- 
formance’); getting higher-frequency data 
involves obtaining permissions from sen- 


NX sor manufacturers. Even basic data such as 
& wind speeds and historical data on turbine 
Wind-turbine maintenance costs could be cut with a data-driven health-monitoring system. operations were initially impossible to 


obtain. By approaching different partners 
and developing data-sharing agreements, 


we eventually gained limited access to wind 
aredataon = st: 


The lack of data sharing in the renewa- 
e ble-energy industry is hindering technical 

progress and squandering opportunities 

\ \ / | | e | e T for improving the efficiency of energy mar- 
kets. I call on the energy industry to follow 

the examples of defence, commerce and 


Giving researchers access to information on turbine health care and share its data openly so that 
performance would allow wind farms to be optimized seine re came 
through data mining, Says Andrew Kusiak. There is money to be made. Academic 
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> and industrial researchers need first to 
develop suitable wind-farm management 
models and prove their value. Software 
companies can sell energy and weather- 
monitoring and -predicting systems. Large 
technology companies such as the Hewlett 
Packard Enterprise or Google should 
establish wind-energy divisions for plan- 
ning and balancing energy across differ- 
ent states and countries, as General Electric 
has done in wind-turbine manufacturing. 
Leveraging renewable-energy data makes 
economic sense for a product — electricity 
— that is universal. Unlike other commer- 
cial industries, energy utilities do not com- 
pete on the basis of product quality but on 
generation and distribution processes and 
business operations, which are the greatest 
beneficiaries of big-data mining. Efficient 
renewable-energy plants equipped with 
software for accurate power prediction and 
responsive management will be able to take 
advantage of real-time, or ‘spot’, energy 
prices — supplying more when prices and 
demand are high and less when they are 
low. This extra profitability will encourage 
more firms and utility companies to acquire 
renewable-energy assets. 


DATA SCIENCE 

The renewable-energy industry is awash 
with data. Wind-turbine manufacturers 
routinely collect data from hundreds of sen- 
sors on experimental and installed devices, 
measuring, for example, wind speed, oil tem- 
perature, vibration and power generation’. 
Utility companies record similar data from 
boilers and generators. 

‘Balancing authorities’, usually non- 
profit, governmental or private organi- 
zations, match the expected demand for 
energy with the production scheduled 
by utility companies hours ahead of 


POOR PERFORMANCE 


generation. National, state and regional 
meteorological agencies and weather 
forecasters accrue radar data and run 
numerical weather-prediction models 
every 1-3 hours to produce forecasts and 
parameters such as wind speed. 

New sources of data are emerging. The 
wind industry is experimenting with using 
sonar and laser-based lidar measurements 
to anticipate the speed, direction and tur- 
bulence of the wind approaching wind 
farms. Some utility companies fly drones 
over their farms to check turbine blades 
and measure wind speeds and directions 
to improve power prediction and to antici- 
pate fluctuations over minutes to hours. 

Renewable-energy producers operate in 
isolation. If industry players pooled their 
data and monitoring resources, they would 
all benefit. More-efficient and lower-cost 
wind-turbine designs could emerge, allow- 
ing turbines to last longer and produce more 
energy, and allowing output to be more 
accurately predicted. For example, combin- 
ing data from wind farms in different US 
states would dramatically improve the accu- 
racy of predicted hourly changes in power 
production. 

Experiments that are impossible with a 
real wind turbine or a farm can be simu- 
lated ona digital replica’. Different control 
strategies can be tested for maximizing and 
smoothing the energy output. Conditions 
of components and subsystems could be 
analysed to lower maintenance costs — the 
most significant expense of wind-energy 
generation’. Active control of turbine 
vibrations could be studied. More stable 
turbines are less likely to fail and could be 
run beyond their current upper speed limit 
(usually around 20 metres per second) 
to produce more energy. The impact of 
atmospheric conditions on wind-farm sites 


Intermittent faults caused by blade misalignment or electronic problems, for example, can 
reduce the power produced by a single turbine. (Data taken at 10-minute intervals over 5 days.) 
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and energy production could be studied. 

Controlling wind turbines with data- 
driven software could, models show, 
increase energy production by at least 10%, 
and gains of 14-16% are possible. Increasing 
the maximum running speed could easily 
add another 10%. Wind-farm maintenance 
costs could be cut by 10% with a data-driven 
health-monitoring system. 

Yet the wind industry remains largely 
oblivious to data science’. A few utility 
companies are setting up in-house data- 
analytics teams, but the benefits of work- 
ing with academic 
researchers and 


“Defence, nd : 
others are not rec- 
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on turbine opera- 
tions — such as a 
graphical display of 
a turbine output — are broadly welcomed, 
direct interventions are impossible. 

I have been unable to test control solu- 
tions developed in my Intelligent Systems 
Laboratory at the University of lowa in any 
commercial settings. Even public utilities 
and colleges that own and operate wind 
turbines ended negotiations once they real- 
ized that their insurance and maintenance 
contracts would have to be modified. 
Wind-turbine insurance contracts tightly 
prescribe operating conditions and safety 
aspects, sometimes requiring turbines to 
be equipped with specific sensors (such as 
for tower vibration and rotor speed). 

Potential for exposing flaws and poor 
design practices is another obstacle. Manu- 
facturers may not want to reveal perfor- 
mance metrics that are covered by warranty 
terms or design details that might point 
to patent infringements. Competition is a 
worry. 


sharing data.” 


OPEN SHARING 

Other sectors do better. Defence, com- 
merce and health-care organizations have 
developed processes for sharing data with 
the research community while maintain- 
ing confidentiality and security. Some 
have created benchmark data sets to test 
data-analysis algorithms. Others run 
competitions to solve specific problems. 
For example, in 2006, the television- and 
film-streaming service Netflix offered a 
US$1-million prize for an improved algo- 
rithm to predict rating scores of films. In 
2011, the US National Renewable Energy 
Laboratory (NREL) ran the Round Robin 
project, in which they shared high-fre- 
quency vibration data from a healthy® and 
a faulty gearbox with competing teams to 
discover the most accurate ways to diag- 
nose faults. It has been estimated that the 
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Turbine failures, resulting from gearbox and other component faults, could be avoided by sharing data. 


value of the voluntary contributions to the 
project from the 16 participating teams 
(including the Intelligent Systems Labo- 
ratory at the University of Iowa) was worth 
between $2 million and $3 million. 

Non-disclosure agreements outlining the 
specifics of data sharing and results dissem- 
ination are used in data-intensive projects. 
Consumer-goods company Proctor & Gam- 
ble, for example, reveals information about 
a product (a new shampoo or a shaver, for 
instance) early in the design stage to poten- 
tial customers, whose feedback improves the 
final design. On social-media platforms such 
as Facebook, users determine the scope of 
information sharing. 


NEW PROTOCOLS 

The renewable-energy industry should 
adopt similar practices. First, it needs to 
decide which data can be shared and at what 
risk. Wind speed and direction, for example, 
could be released given that anyone could 
measure them. Although data on the real- 
time energy output of an entire wind farm 
should be rightfully protected for competi- 
tive reasons, sharing power produced by one 
or a few turbines would not compromise 
business value. When necessary, data could 
be transformed or anonymized; for exam- 
ple, by reporting relative percentage changes 


rather than absolute power values. 

Wind-energy associations in Asia, 
Europe, South America and North America 
should facilitate the data-sharing discussion. 
A summit of these players should define a 
path to open-access data as follows. 

First, make all renewable-energy stake- 
holders aware of the problems and of the 
benefits of data sharing. Invite representa- 
tives from other manufacturing and service 
industries to present their data-sharing 
practices. 

Second, develop data-sharing protocols 
and governance structures. US Depart- 
ment of Energy laboratories such as the 
NREL and Sandia National Laboratories 
could lead this effort because they collect 
renewable-energy data from some wind- 
farm operators for their own studies. Col- 
lecting data at higher frequencies (in some 
cases), at fraction-of-a-second intervals, 
from more utility companies and facilitat- 
ing open access to them would be the next 
step. Although data collection should ide- 
ally be global, in reality, most useful results 
would be regional. 

Third, develop a data-and-knowledge 
sharing platform for renewable energy. 
Stakeholders must decide how the data are 
to be assembled and pre-processed for use 
by the research community and industry. 
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Ideally, data would flow out to the research 
community and research results in the form 
of new models, algorithms, design solu- 
tions and other results would flow in. The 
vast majority of the results produced by the 
research community would remain open to 
review, scrutiny, future use and benchmark 
studies. Industry could retain ownership of 
the internally generated results as well as 
those produced by research contracts. 

This long-awaited engagement will 
generate new science and greatly benefit 
renewable-energy companies, energy- 
equipment manufacturers and soci- 
ety by bringing more clean energy at a 
lower price. m 


Andrew Kusiak is professor of mechanical 
and industrial engineering, and director of 
the Intelligent Systems Laboratory, at the 
University of Iowa, Iowa City, Iowa, USA. 
e-mail: andrew-kusiak@uiowa.edu 


1. Kusiak, A. Verma, A. & Wei, X. Wind Systems 3, 
36-39 (2012). 

2. Kusiak, A., Zhang, Z. & Xu, G. /EEE Trans. Sustain. 
Energy 4, 756-764 (2013). 

3. Zhang, Z., Zhou, Q. & Kusiak, A. EEE Trans. 
Sustain. Energy 5, 228-236 (2014). 

4. Kusiak, A., Zhang, Z. & Verma, A. Energy 60, 1-12 
(2013). 

5. Kusiak, A. Ind. Eng. 47, 38-42 (2015). 

6. Zhang, Z., Verma, A. & Kusiak, A. /EEE Trans. 
Energy Convers. 27, 526-535 (2012). 


7 JANUARY 2016 | VOL 529 | NATURE | 21 


Glacier National Park in Montana is one of more than 400 sites administered by the US National Park Service, which turns 100 this year. 


Science in culture 2016 


Gear up for some big birthdays, as anniversaries roll around for Star Trek, H. G. Wells 
and the US National Park Service. And jostling for the spotlight are Finding 

Nemo’s fishy crew, a modern twist on haute couture, groundbreaking artists, 
ground-quaking dinosaurs and (perhaps) Keanu Reeves. Daniel Cressey reports. 


100 Years: The US National Park Service 
On 25 August, the US National Park 
Service (NPS) celebrates its centenary 
— 100 years since President Woodrow 
Wilson signed it into existence, building 
on the 1872 creation of Yellowstone 
National Park by his predecessor 
Ulysses S. Grant. The service, which 
protects iconic landscapes from 
California’s Yosemite to the Florida 
Everglades, is hosting events across 

its 409 sites. Of course, the greatest 
show of all, curated by nature itself, 
runs 365 days a year in NPS parks, 
from the Kilauea and Mauna Loa 
volcanoes on Hawaii's Big Island to the 
3,516-kilometre Appalachian Trail on 
the US East Coast — the world’s longest 
footpath open only to hikers. 
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Dinosaur extravaganza 
American Museum of Natural History, New York City 
Starting in January 


New York gets a titanic new resident from 
15 January, when the American Museum 
of Natural History installs a 37-metre-long 
cast of an as-yet-unnamed titanosaur. The 
bones of this giant herbivore, yet to be 
officially designated a species, were dug out 
of the Patagonian desert in Argentina. But 
that is just the start of the ancient-animal 
rollout. From March 2016 to January 

2017, the exhibition Dinosaurs Among Us 
will explore how the titanosaur’s relatives 
evolved to become birds: rare fossils and 
huge models will shed light on everything 
from birds’ bones to dinosaur behaviour 
and brains. And from May 2016 to January 
2017, Ancient Predators in a Modern World 
will tour 200 million years of crocodiles and 
their alligator, caiman and gharial relatives. 
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These remarkably specialized beasts are still 
recognizably the same as their ancestors that 
shared Earth with the dinosaurs. 


manus x machina 
The Metropolitan Museum of Art, New York City 
5 May- 14 August 


NPS/JACOB W. FRANK 


The Costume Institute at New York’s 
Metropolitan Museum of Art aims to show 
the world that there is more to high fashion 
than pouts and peplums: technology 

and style have been in symbiosis from 

the off. From an 1880s Worth gown to a 
2015 Chanel suit, this show contrasts and 
draws parallels between the handmade 
marvel of haute couture (manus) and 
machine-produced clothing (machina). Do 
technologies such as laser cutting, ultrasonic 
welding or 3D printing stand up against 
intricate embroidery and hand-stitched 
linings? Strut in for a look. 
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Moholy-Nagy: Future Present 
Guggenheim Museum, New York 
June - September 


This major retrospective comes 70 years after 
the death of Hungarian industrial designer 
and radical artist Laszl6 Moholy-Nagy, whose 
oeuvre spanned photography, abstract 
painting and metal sculpture. The artist 
founded a school of design in Chicago, Illinois, 
and died in the city in 1946. He was a key 
player in the German Bauhaus movement, 
embracing the transformative power of 
technology and mechanization in kinetic 
artworks (like his contemporary, Alexander 
Calder; see go.nature.com/n8rzsn). In the 
early 1920s, Moholy-Nagy experimented with 
outsourcing paintings by describing them in 
detail over the telephone to a painter in a sign 
factory, with the aid of colour charts and graph 
paper. The exhibition travels to Chicago and 
Los Angeles, California, after its New York run. 


Finding Dory 
Director: Andrew Stanton 
Opens 17 June 


Digital-animation giant Pixar releases the 
much-anticipated follow-up to its 2003 
Finding Nemo, a film so successful that 
clownfish are now often referred to as 
‘nemos’. The original had marine biologists 
in raptures over its faithfulness to the science. 
Pixar has a mixed record when it comes to 
sequels, but if Finding Dory, featuring Nemo’s 
Paracantharus friend (pictured), can combine 
the remarkable accuracy with the superb 
storytelling that the company is capable of, 
it could join Pixar’s list of Oscar- 
botherers. Rumours suggest that 
the film was rewritten after the 
success of Blackfish, the 2013 
documentary by director 
Gabriela Cowperthwaite that 
criticized the controversial 
keeping of killer whales in 
captivity. 


Engineering the World 

Victoria and Albert Museum, London 

18 June - 6 November 

Early in his career, engineer Ove Arup 
(1895-1988) worked on the floating 
Mulberry Harbours — temporary concrete 
breakwaters and piers set up for the Second 
World War Allied landing in Normandy 

on D-Day in 1944. He went on to help 

build iconic structures such as the Sydney 
Opera House, where his instincts for 
aesthetics and materials shone, before 
founding international mega-consultancy 
Arup, whose masterworks range from 
London's new research powerhouse, the 
Francis Crick Institute, to the Victoria and 
Albert Museum's own ongoing expansion 
plans. The exhibition promises to reveal 
Arup’s multidisciplinary approach “as a 
humanistic and technological tool for social 


50 Years: Star Trek 

The world of Star Trek, first brought 

to television by US screenwriter Gene 
Roddenberry in 1966, inspires love in 
seemingly inverse proportion to the 
quality of its set design, special effects and 
(occasionally) acting. What keeps legion 
Trekkies passionate is the lingering glow 
of Roddenberry’s delight in discovering 
“new life and new civilizations”. Since the 
original show — with its then-radical multi- 
ethnic crew (Some members pictured) 

— there have been another 4 live-action 
television series and 12 films, with fans 
from NASA leaders to schoolchildren. With 
the 13th film, Star Trek Beyond, due this 
year and a new television series promised 
in 2017, Star Trek remains the key 
science-fiction universe of modern times. 
(Sorry, Star Wars fans.) 


responsibility” and features prototypes, 
models and digital animations. It is part of the 
museum's Engineering the World exhibition, 
which will also include an installation by 
architect Achim Menges. 


Georgia O’Keeffe 
Tate Modern, London 
6 July — 30 October 


Visceral, often quasi-abstract evocations of 
botanical morphology in close-up paintings 
of irises and petunias helped to establish 
Georgia O’Keeffe’s early career in 1920s 
New York. This retrospective at the Tate 
Modern will demonstrate the remarkable 
range of this groundbreaking modernist 
artist. In the late 1920s, O’Keeffe (pictured) 
moved figuratively and literally west to New 
Mexico, where the exposed, multicoloured 
topography inspired her to paint powerful 
landscapes and surreal studies that 
juxtaposed blossoms, bones, rocks and 
deadwood — paintings that in turn inspired 
others to re-examine the geology and 
biological riches of desert places. 
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ANS (Autonomes NervenSystem) 
Staatsoper, Berlin 
12-16 July 


The autonomic nervous system controls 
unconscious human bodily functions such 

as heart rate and breathing. Greek-born 
musician Irini Amargianaki presents a 
multidimensional exploration of perception, 
incorporating instrumental pieces, video 
projections by Maryna Shuklina and shadow 
puppetry by Lisa Haucke. Premiering in 

Berlin as part of the Infecktion! festival for new 
musical theatre. 


Colour and Vision 
Natural History Museum, London 
15 July -— 6 November 


How did colours appear in the living world, 
and how did animals evolve the ability 

to see them? This exhibition at London’s 
Natural History Museum will take viewers 
from the eyes of the beholder to the art 
and innovation that have emerged from 
nature’s wild palette. Colour is crucial 

— from lights telling you to go or stop 

to tropical frogs that sport pigments 
screaming ‘poison’. 


The Universe and Art 

Mori Art Museum, Tokyo 

30 July - 9 January 2017 

This exhibition asks how humans have 
viewed the Universe through millennia, 
starting with The Tale of the Bamboo Cutter 
— the oldest known piece of narrative prose 
in Japan, which dates back to the tenth 
century — and zipping forward to the age 
of the International Space Station and the 
search for alien life. Ranging from ancient 
art to contemporary astronomy, the show 
promises to explore how people through 
the ages have conceived of the vastness 
around them. > 
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H. G. Wells (right) wrote more than 100 
books, including science-fiction classic The 
War of the Worlds (top). 


150 Years: H. G. Wells 

This year sees two key dates focused on 
prescient author, scientist and educator 
H. G. Wells: the 150th anniversary of 

his birth, and the 7Oth of his death. 
Wells, who wrote more than 100 books, 
including The Time Machine (1895), The 
War of the Worlds (1898) and The Island 
of Doctor Moreau (1896), transformed 
turn-of-the-century science into literature 
that is still read, dissected and argued 
over today. Some of his short stories will 
be brought to life in a series of dramas 
airing on UK channel Sky Arts, and the 
H.G. Wells Society plans a programme of 
events (see go.nature.com/aasjbm). 
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>» Deepwater Horizon 
Director: Peter Berg 
Opens 30 September 


The 2010 Deepwater Horizon explosion 
and oil spill — which killed 11 people and 
spewed more than 3 billion barrels of crude 
oil into the Gulf of Mexico — was the worst 
environmental disaster in recent US history. 
Ecosystems and the regional economy 
suffered: wildlife died, beaches were shut 
and fisheries floundered. Now the events 
leading up to the disaster get the Hollywood 
treatment, as Mark Wahlberg takes on the 
role of an electronics technician on the 
doomed drilling rig. Expect gritty drama and 
courage in the face of adversity. 


Replicas 

Director: Tanya Wexler 

Few details were available on this science- 
fiction thriller as Nature went to press — not 
normally a good sign ina film. But reports 
that Keanu Reeves will play a neuroscientist 
who fights the government, police and the 
very laws of science to resurrect his family 
are intriguing. 


The Douanier Rousseau — Archaic Candour 
Musée d'Orsay, Paris 

22 March - 17 July 

Dozens of pieces by French 
post-Impressionist Henri Rousseau go on 
display at the Musee d’Orsay in Paris after a 
stint in Italy. They include stunning depictions 
of jungles, created by a man who never left 
France but regularly visited the botanical 
gardens of Paris. 


Leonardo da Vinci: The Mechanics of Genius 
Science Museum, London 
10 February — 4 September 


Who needs the Mona Lisa when you can have 
a flying machine? This touring exhibition 

of ingenious models brings to life the 
mechanical contraptions sketched out by 
history’s greatest polymath. 


Strandbeest: The Dream Machines of 
Theo Jansen 

Exploratorium, San Francisco, California 

27 May—5 September 

Arriving in California in May, far from their 
birthplace on the Dutch coast, are the vast, 
surreal-looking Strandbeest (‘beach beast’) 
automatons created by artist Theo Jansen. 
These mechanical animals move on gusts 
of wind and have amazed viewers with 
their strange dances, at once robotic and 
naturalistic. m 
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China boom leaves 
children behind 


Some 61 million children were 
left behind by migrant parents 
in China in 2010-14, or almost 
22% of the country’s children 
(see go.nature.com/clylrn; in 
Chinese). This side effect of 
urban development is seriously 
affecting the mental and physical 
health of those abandoned 
juveniles who are uncared for. 
We urge China's government to 
weigh this social damage against 
economic gain and to take steps 
to mitigate it. 

There were 274 million 
Chinese migrant workers in 
2014 — an unprecedented 
number. Evidence is increasing 
for the adverse effects of such 
upheaval on some children’s 
physical, psychological and 
social development (Q. Li et al. 
China Econ. Review 36, 367-376 
(2015); G. Ding and Y. Bao 
J. Child Psychol. Psychiatr. 55, 
411-412 (2014)). 

Despite China’s great 
economic achievements, 
its childcare services 
remain underfunded and 
underdeveloped. There is an 
urgent need for policy reform 
and strategies to tackle the 
problem. These include the 
development and enforcement 
of family interventions, 
community support and 
schooling improvements. 

Peng Yuan, Long Wang 
Xiangya Hospital, Central South 
University, Changsha, Hunan, 
China. 

wanglong@csu.edu.cn 


Recover wastewater 
resources locally 


As contributors to governmental 
initiatives to reuse wastewater 
pollutants in the European 
Union, the United States 

and China, we consider that 
decentralized recovery of these 
resources could result in more 
environmental, economic and 
social benefits than the near-term 
upgrade of centralized facilities 


(see W.-W. Li et al. Nature 528, 
29-31; 2015). 

Decentralized local treatment 
and reuse facilities avoid the large 
transportation and energy costs 
of conveying treated wastewater 
back to catchment areas for 
reuse. Concentrating nutrients 
for recovery also consumes large 
amounts of energy: urine makes 
up only 1% of total wastewater 
volume, and about 80% of 
nitrogen and 50% of phosphate in 
wastewater are from urine. 

However, nitrogen and 
phosphorus can be locally 
recovered from urine using 
urine-diverting toilets, which 
greatly reduce nutrient loads 
to existing plants (J. Elser and 
E. Bennett Nature 478, 29-31; 
2011). Similar strategies can be 
applied to carbon, leading to 
greater energy recovery through 
co-digestion of sludge with food 
waste, and direct carbon capture 
and storage for climate-change 
mitigation. 

International strategies 
for nutrients, energy and 
water (“NEW initiatives) 
aim to transform the water 
infrastructure for resource 
recovery. By balancing near-term 
goals and long-term ambitions, 
water ‘waste’ should become a 
misnomer. 

Zhiyong Jason Ren University of 
Colorado, Boulder, USA. 

Art K. Umble MWH Global, 
Denver, USA. 
zhiyong.ren@colorado.edu 


Half of samples fail 
protein-blot tests 


Poorly characterized antibodies 
give rise to irreproducible results 
(see, for example, Nature 527, 
545-551; 2015), but so can 

the use of properly validated 
antibodies in a non-validated 
context. 

At Aviva Systems Biology in 
California, we used our highly 
specific commercial antibodies 
in western immunoblot assays 
to test more than 1,000 protein 
samples provided by the research 
community. We found that the 


preparation quality of more than 
half of these samples failed to 
meet the technical requirements 
for a reliable assay signal. 

Simple technical factors 
confounded the electrophoretic 
resolution or antibody 
detectability of the researchers’ 
protein solutions. These included 
unsuitable sample concentrations, 
buffer incompatibility and the 
absence of calibration markers or 
treatment controls. Until uniform 
western-blotting standards are 
widely adopted (see J. E. Gilda 
et al. PLoS ONE 10, e0135392; 
2015), there is a risk that data 
irreproducibility will continue to 
be the norm. 

Antibody-production 
companies should not be 
treated as casinos for boosting a 
researcher's chances ofa positive 
result from such shot-in-the-dark 
samples. 

Matt Landry Aviva Systems 
Biology, San Diego, California, 
USA. 

Aldrin V. Gomes University of 
California, Davis, USA. 
mlandry@avivasysbio.com 


UK budget cuts 
erode Paris promises 


Two weeks before the UK 
government signed up to keep 
global warming well below 

2°C at the 2015 United Nations 
climate summit in Paris, it 
announced a 22% budget cut for 
the Department of Energy and 
Climate Change. It also scrapped 
a previously ring-fenced 
£1-billion (US$1.5-billion) 
budget for developing carbon 
capture and storage. UK 
decarbonization targets might 
be unachievable without this 
technology. 

In our view, these actions 
signal that the UK government 
does not treat climate action as 
a priority, and that it is ignoring 
the evidence of the research 
it funds. For example, data 
collected by the UK Met Office 
show that 2014 and 2015 
have been the warmest years 
on record. 


To regain credibility, the 
government must overcome 
internal division (N. Carter 
and B. Clements Br. Politics 10, 
204-225; 2015) and develop a 
robust climate policy that is in 
line with its stated ambitions. 
Alexander C. Lees Cornell 
University, Ithaca, New York, USA. 
Andrew Balmford, Ben Phalan 
University of Cambridge, UK. 
btp22@cam.ac.uk 


Come together to 
study life’s origins 


Researchers working on the 
origins of life tend to fall into two 
camps — those who investigate 
artificial life and those who 
study the origins of life on Earth 
four billion years ago. The 
communication gulf between the 
two needs to be closed if the field 
is to progress. 

Artificial-life researchers are 
less concerned about how life 
originated on Earth than with 
the idea of life as a universal 
phenomenon — including its 
emergence and self-organization. 
And those pursuing experimental 
verification of mechanisms for 
terrestrial origins are seldom 
drawn to the broad theoretical 
ideas of artificial life. 

The Earth-Life Science 
Institute’s Origins Network, 
working with members of 
the research community, has 
issued a statement to encourage 
fresh approaches to the subject 
(C. Scharf et al. Astrobiology 
15, 1031-1042; 2015). We 
suggest that origins-of-life 
research requires inspirational 
innovation, cross-disciplinary 
collaboration and reassurance 
from institutions that such 
research will be supported. We 
hope that these proposals will 
help to train a new generation of 
scientists to think more broadly 
and less tribally. 

Caleb Scharf Columbia 
University, New York, USA. 
Nathaniel Virgo, H. James 
Cleaves Earth-Life Science 
Institute, Tokyo, Japan. 
caleb@astro.columbia.edu 
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Why black holes pulse brightly 


Black holes can produce oscillating outbursts of radiation that were thought to be associated with high rates of infalling 
matter. The observation of pulses of visible light from a black hole complicates this picture. SEE LETTER P.54 


POSHAK GANDHI 


ccretion of matter onto black holes is 
A: efficient way of converting mass 

into energy, much more so than the 
process of nuclear fusion, which powers the 
light from stars. But unlike fusion, the phys- 
ics behind accretion is still not understood, 
more than 40 years after the identification of 
accreting black holes in the Milky Way’”. On 
page 54 of this issue, Kimura et al.’ present 
exquisite observations made during a black- 
hole accretion episode. They show that the 
visible radiation from the black hole’s vicinity 
oscillates dramatically — sometimes regularly, 
other times not — in a manner not predicted 
by models. Such oscillations were previously 
associated with high rates of infalling matter, 
but the authors report that the observed oscil- 
lations can occur even when the rate of infall 
is low. Understanding this behaviour could 
help astronomers to better understand violent 
accretion episodes onto black holes. 

The researchers studied the black hole V404 
Cygni, which is 2.4 kiloparsecs from Earth. 
The Cygnus constellation is a popular area 
of the sky for black-hole specialists because it 
hosts several other bright, accreting black holes 
and neutron stars. In June 2015, V404 Cygni 
underwent a short-lived accretion ‘outburst’ 
that lasted for about two weeks, causing it briefly 
to become one of the brightest cosmic X-ray 
sources beyond the Solar System. The black 
hole’s gravity is strong enough to strip matter 
off the surface of an orbiting companion star, 
and the potential energy of this infalling matter 
is released, in part, as the observed electromag- 
netic radiation. 

The infalling matter is thought to be hot, 
magnetized plasma. But if this material were 
to plunge directly into the black hole, its energy 
would be lost immediately without any bright- 
ening. The standard picture of accretion is that 
the plasma instead acts as a viscous fluid that 
spirals towards the black hole in the form of a 
disk, and that its energy is liberated as a result 
of friction in the disk. Any plasma that cannot 
be accreted is expelled in the form ofa fast nar- 
row stream (a jet) or as an outflowing wind. 

If there is a balance between the accret- 
ing plasma and frictional energy losses, then 
the mass is steadily accreted. But naturally 
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Unstable illuminated 
plasma disk 


Orbiting star 


Figure 1 | Light pulses from irregular accretion onto black holes. Black holes can accrete matter from 
orbiting stars. The matter is thought to spiral towards the black hole as a plasma disk, and friction in the 
disk causes energy to be released in the form of electromagnetic radiation. Plasma that cannot be accreted 
is expelled as a jet. Kimura et al.” propose that, in systems such as V404 Cygni, the supply of infalling 
matter cannot steadily fill the disk between the companion star and the black hole, causing fluctuations 

in the density of matter in the disk. These fluctuations trigger oscillating emissions of X-rays (white lines 
radiating from black hole) near the black hole that ionize hydrogen atoms in the outer part of the disk and 
cause pulses of visible light (gold region), as observed by the authors. 


occurring changes in the rate of mass accretion 
can upset this balance and cause an unstable 
see-saw-like behaviour: periods of enhanced 
accretion that empty parts of the disk are fol- 
lowed by quieter periods when the parts are 
refilled, after which the cycle begins again. An 
approximate analogy is the repetitive filling 
and emptying ofa Japanese bamboo fountain. 

Such behaviour has been observed in one 
other black-hole system, GRS 1915+105 in 
the Aquila constellation, which undergoes 
high levels of mass accretion. Several classes of 
repetitive oscillation occur in this system, but 
only in its observed X-ray emission’. Kimura 
and collaborators draw parallels between 
GRS 1915+105 and the visible-light oscil- 
lations in V404 Cygni, but make the crucial 
distinction that the latter oscillations occur 
at a much lower rate of mass accretion than 
the former ones. In other words, the repetitive 
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behaviour is not strictly associated with 
episodes of high mass accretion. 

V404 Cygni is an important study target for 
several reasons. It was the first Galactic object 
to have its mass (nine solar masses) firmly 
placed within the range of masses associated 
with black holes*®. Its distance from Earth is 
also known with higher accuracy than those of 
other black holes’. Moreover, it looks extremely 
bright when it accretes matter, despite being 
partly veiled behind interstellar gas and dust. 
In the absence of this veil, V404 Cygni would 
have been one of the most distant objects in the 
Milky Way visible in dark skies to the unaided 
eye in June 2015. Because V404 Cygni is so well 
characterized, Kimura et al. are able to pro- 
pose a mechanism to explain the visible-light 
oscillations. 

The authors suggest that in systems such 
as V404 Cygni and GRS 1915+105 there is a 


relatively large volume of space between the 
companion star and the black hole, which allows 
alarge disk to form. But the supply of infalling 
matter from the companion star is insufficient 
to fill such a large disk with a steady flow. With- 
out a steady flow, the accretion rate becomes 
unstable and can fluctuate violently (Fig. 1). 
These fluctuations, in turn, trigger oscillating 
emissions of energetic X-ray photons near the 
black hole, which then light up the whole disk 
with the observed pulsating visible effects. 

But the authors show that this explanation 
requires the disk to be very large, close to its 
maximum possible size. Moreover, the X-ray 
oscillations that they observed from V404 
Cygni are much stronger than the visible- 
light ones. These puzzling facts will need to 
be accounted for. How, and whether, the jet of 
the black hole tracks these oscillations is also 
yet to be determined. The proposed parallels 


between the observed oscillations and those of 
GRS 1915+105 will undoubtedly be investigated 
in detail in the future. This will help researchers 
to understand the above issues in light of the 
wealth of supporting observations currently 
being analysed by astronomers the world over. 
Black-hole outbursts are unpredictable 
and some can be two weeks or even shorter 
in duration, so worldwide coordination and 
round-the-clock monitoring is essential if we 
are to understand the physics of these extreme 
events. This becomes particularly challeng- 
ing when coordinating observations between 
space telescopes and those on the ground. The 
outburst of V404 Cygni last year invigorated 
the efforts of black-hole astronomers to tackle 
these challenges, with at least one conference 
dedicated entirely to this theme. Amateurs 
can also play a key part in this effort. Kimura 
and colleagues gathered data from many small 


Different worlds 


Patterns of species association reveal that terrestrial plant and animal 
communities today are structured differently from communities spanning the 
300 million years that preceded large-scale human activity. SEE LETTER P.80 


GREGORY P. DIETL 


he British author L. P. Hartley wrote 

| in one of his best-known novels, The 
Go-Between, that “The past is a foreign 
country: they do things differently there.” 
This poignant imagery of remoteness from 
the past captures the essence of an emerging 
global consciousness. Human hegemony over 
nature has become so pervasive and profound 
that it is quite possible that we have created 
a world that has little or no precedent — in 
ecological parlance, it has no analogue. On 
page 80 of this issue, Lyons et al.’ detail a com- 
pelling case that this extraordinary situation 


a Random species pair 


Species 1 


Figure 1 | Species associations. In a terrestrial ecological community, 

any two species may occur randomly (a) at locations in a landscape. 
Alternatively, species pairs may be non-randomly associated, in which case 
they can be either segregated (b), meaning that the two species co-occur less 
frequently than would be expected by chance, or aggregated (c), meaning 


Species 2 


is an undeniable reality for the rules that gov- 
ern how plant and animal communities are 
structured. 

The authors assembled data on the pres- 
ence and absence of terrestrial plant and 
animal taxa for 80 fossil and modern assem- 
blages in North America, Africa and Eurasia, 
spanning the past 300 million years. Using 
a statistical approach that was designed to 
compare occurrence data against a rand- 
omized ‘null’ assemblage, they quantified the 
fraction of species pairs in each assemblage 
that deviated from random expectations 
about where they should be found. Species 
pairs meeting this criterion provide valuable 


b Segregated species pair 
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telescopes, some with optical elements only 
20 centimetres in diameter, showing that, in 
astronomy, size is not necessarily what matters; 
collaboration does. m 


Poshak Gandhi is in the Department 
of Physics & Astronomy, University of 
Southampton, Highfield, Southampton 
SO17 1BJ, UK. 

e-mail: p.gandhi@soton.ac.uk 
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insight into the ecological processes that 
structure communities”. 

In modern communities, most species pairs 
show random co-occurrence, but those that 
are non-randomly associated are typically seg- 
regated — that is, they tend to co-occur less 
frequently than would be expected by chance” 
(Fig. 1). Lyons et al. wanted to know whether 
the fossil record is consistent with this pattern 
of species segregation. The headline finding is 
that the pattern of co-occurrence dominating 
modern communities departs sharply from 
that of the past. As in modern communities, 
co-occurrence of most pairs was random. 
But unlike in modern communities, the non- 
random associations were dominanted by 
aggregated species pairs, which co-occur more 
frequently than would be expected by chance” 
(Fig. 1). This dominance of aggregated pairs 
persisted with little change for more than 300 
million years on different contintents and 
across diverse taxa, until about 6,000 years 
ago, when the sharp transition to the segre- 
gated co-occurrence pattern began. 

After running a battery of tests to ascertain 
that this temporal trend was not an artefact, 


c Aggregated species pair 


that they co-occur more frequently than expected by chance. Among 
non-randomly associated pairs, Lyons et al.' documenta shift from a 
dominance of aggregated pairs before the expansion of human populations 
to the segregated pattern typically seen today. (Figure adapted 

from ref. 7.) 
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the authors speculate that an expanding 
human population may explain why species 
co-occurrence patterns are so different today. 
The shift was most obvious in North American 
assemblages (where the most occurrence data 
were available) and coincided with the inexo- 
rable spread of agriculture in this region. The 
authors propose that habitat fragmentation 
and limitations on species dispersal associated 
with land use were probably the main engines 
driving the shift. The structure of plant and 
animal terrestrial communities would never 
be the same again. 

This interpretation is sure to attract fervent 
debate and lead to further research to confirm 
the pattern and disentangle the proposed mech- 
anisms involved. The tension between the dis- 
tant past and the familiar present that the study 
highlights, however, has an underlying impli- 
cation that may not be as obvious. If the past is 
different from the present (in this case not in the 
immanent processes that were operating, but in 
their frequency), its applicability to our current 
societal need to anticipate ecological changes 
and design adaptation measures — a goal that 
Lyons et al. acknowledge is a priority — is not 
immediately manifest. There is no easy way 
around this tension. At stake is whether we can 
reliably use the past as a guide to an uncertain, 
anthropogenically modified future. 

A small cadre of voices argues that a human- 
dominated present limits the use of the past 
as a key to unlocking the future’. In this view, 
the world we live in today, and the immediate 
future that our grandchildren will inherit, has 
no analogue in the geological past. As a con- 
sequence, referencing ‘natural experiments’ 
in the distant past as a guide to predict what 
might happen, now or in the future, is a flawed 
strategy. Out is the use of uniformitarianism* 
as a guiding principle, and in is a new kind of 
‘post-normal’ science®. Lyons and colleagues’ 
study of human impacts on community- 
assembly rules, at least as implicated by species 
co-occurrence patterns, seems to embody 
evidence for this no-analogue world. 

A more optimistic view of this tension 
between the past and present — one that 
acknowledges that processes change and inter- 
act in complex ways over time, whether human 
action is involved or not — is that it poses a chal- 
lenge for how we select analogues from the past 
to gain insight into future conditions. Lyons and 
colleagues’ finding is a stark reminder that ana- 
logue selection often over-stresses likenesses at 
the expense of differences. However, small and 
unknowable differences in starting points may 
overwhelm the signal of the likenesses, making 
analogue selection a risky business. To use the 
past as a guide, we must select from the dense 
fabric of likenesses and differences that was its 
contingent state at a moment in time, and apply 
only those particular events and conditions 
relevant to our present needs. 

Moving beyond this tension will require 
creative ways of thinking about how we use 


the distant past to improve our understand- 
ing of the present and our anticipation of 
the future, which may provide a ground for 
wiser action. Lyons and colleagues’ study is 
an excellent entry point into thinking about 
this problem. = 


Gregory P. Dietl is at the Paleontological 
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USA and in the Department of Earth and 
Atmospheric Sciences, Cornell University, 
Ithaca. 
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Host protein clips bird 
flu’s wings in mammals 


The polymerase enzyme from avian influenza A viruses does not function well 
in human cells. The protein ANP32A has been identified as the cellular factor 
mediating a major component of this host restriction. SEE LETTER P.101 


ANICE C. LOWEN 


natural hosts, including mammalian and 

avian species. Yet transmission of these 
viruses between mammals and birds occurs 
only rarely, owing to host restriction: an influ- 
enza A virus that is adapted to an avian host 
typically does not grow well in a mammalian 
host, and vice versa. When such restrictions 
are overcome and an avian Virus transmits to 
humans, a pandemic can occur. On page 101 
of this issue, Long et al.' report a breakthrough 
in understanding the restriction of avian 
influenza viruses in mammals. 

The protein PB2 is anecessary component 
of the influenza A polymerase enzyme com- 
plex, which copies the viral genome and thus 
is essential for viral replication. For many 
years, researchers have known that a specific 
domain of PB2, the 627 domain, is involved in 
host restriction’. H5N1 strains and other ‘bird 
flv viruses rapidly acquire mutations in this 
domain following transmission to humans or 
inoculation of mammals in the laboratory™*. 
These mutations, in turn, greatly enhance 
the growth, virulence and transmission of 
avian influenza A viruses in mammals” *. Yet 
despite intense effort, the host factors and 
mechanisms that limit the functionality of 
non-mutated avian-adapted PB2 proteins in 
mammalian cells” * remained obscure. 

Long et al. knew from previous work” that 
the avian-adapted PB2 did not work well in 
mammalian cells because of the absence of a 
factor that enhances polymerase activity in 
avian cells, rather than because of the pres- 
ence of an inhibitory factor in mammalian 
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cells. To identify the missing positive factor, 
the authors used a panel of hybrid hamster 
cell lines that each carried a different frag- 
ment of the chicken genome. They expressed 
an avian-adapted PB2 protein, along with its 
essential viral partner proteins, in each cell 
line and measured the activity of the viral 
polymerase (Fig. 1). Out of 53 hybrid cell 
lines tested, four showed robust activity of the 
avian-adapted polymerase complex. By iden- 
tifying the chicken genes that were shared by 
these four cell lines, Long et al. narrowed their 
search for the positive avian factor to just 
12 genes. Then, by expressing each of the 
candidate genes singly in mammalian cells, 
the authors found what they were looking for: 
chicken ANP32A isa single gene that enables 
an avian-adapted PB2 protein to function 
efficiently in mammalian cells. 

Confirmation that ANP32A protein 
supports influenza-polymerase activity was 
obtained by decreasing the expression of 
ANP32A in cells. When levels were reduced 
in chicken cells, the activity of an avian- 
adapted viral polymerase decreased. Simi- 
larly, when expression of the mammalian 
version of ANP32A was reduced in human 
cells, a human-adapted viral polymerase was 
less active. Thus, ANP32A is crucial for influ- 
enza A virus replication in both birds and 
mammals, but avian-adapted polymerases 
work inefficiently with mammalian ANP32A. 
These findings indicate that the adaptive 
changes that influenza viruses acquire in 
the PB2 627 domain following transmission 
to mammals allow the viral polymerase to 
partner with mammalian ANP32A. 

The researchers report that chicken and 
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Figure 1 | A cellular factor for host restriction. The influenza virus’s 
polymerase enzyme complex is essential for viral replication in a host cell. 
One component of this complex, the protein PB2, contributes to host 
restriction — the fact that avian-adapted viral polymerases do not 
function properly in mammalian cells. To identify the host protein that 
partners with PB2 to cause restriction, Long et al.' used a panel of hybrid 


human ANP32A proteins are similar except 
for a stretch of 33 amino acids that is missing 
from the human protein. All avian ANP32A 
genes, except those of ostriches and other 
ratites, encode these 33 amino acids, whereas 
all mammalian versions lack this region. 
Fittingly, addition of this sequence to a mam- 
malian ANP32A protein was sufficient to 
permit avian-influenza PB2 function in mam- 
malian cells. With this finding, what is known 
of bird flu in ostriches now makes perfect 
sense: influenza viruses isolated from ostriches 
tend to carry a PB2 with a mammalian-like 
sequence in the 627 domain”. 

Long et al. have identified a host-cell protein 
that has an important function in the life cycle 
of influenza A viruses and that is a major 
factor in their host specificity. But it is still 
unclear how the virus uses ANP32A. The 
authors show that the protein does not alter 
the expression of PB2 nor its accumulation 
in the cell nucleus, where the viral genome 
is replicated. Is it instead directly involved in 
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the replication of viral RNA? Relatively little 
is known about the host requirements for this 
step in the life cycle of the virus. 

Investigating the precise relationship 
between PB2 and ANP32A will not only 
give insight into the mechanism of influenza 
host restriction, but may also trigger further 
discovery of virus—host interactions that 
contribute to viral RNA replication. More- 
over, the influence of adaptation in the 
PB2 627 domain on viral fitness suggests that 
disrupting the virus-ANP32A interaction 
could be a powerful means of controlling 
influenza infection. Therefore, elucidation of 
ANP32As role in virus replication in molecu- 
lar detail may open the way to the develop- 
ment of new antiviral drugs. = 


Anice C. Lowen is in the Department of 
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Sources of Chaco wood 


Tree rings can pinpoint the source of wood as well as how old it is. This method 
has now been used to identify the sources of timber used by the Native Americans 
who constructed the pre-Columbian ‘great houses’ of Chaco Canyon. 


JARED DIAMOND 


he largest buildings erected by Native 
Americans in North America before 
European arrival were those at Chaco 
Canyon in what is now New Mexico, where 
the Ancestral Puebloan culture thrived 
between about Ap 850 and 1140 (refs 1, 2). 
One of the unanswered questions that those 


buildings pose is: where did the Ancestral 
Puebloans obtain large logs for their build- 
ings in a desert with few trees? Writing in 
Proceedings of the National Academy of 
Sciences, Guiterman et al.* have used the 
method of tree-ring sourcing — not dating, 
but sourcing — to identify the origin as moun- 
tain forests more than 75 kilometres away and 
1,000 metres higher up. 
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hamster-chicken cell lines, each containing a different region of the 
chicken genome. They expressed avian-adapted influenza polymerase 
in these cells and measured its activity; by comparing the chicken 
genomic components of cell lines with polymerase activity, the authors 
identified the host gene ANP32A as underlying host restriction of viral 
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Modern visitors to Chaco are astonished 
that Ancestral Puebloan agriculture supported 
a complex society of thousands of people ina 
fragile dry environment where few try to farm 
today. Yet this culture succeeded for centuries, 
using sophisticated methods of managing 
water run-off from brief downpours. Their 
largest buildings, termed ‘great houses, rose 
up to six storeys and contained hundreds of 
rooms; they remained North Americas tallest 
buildings until steel construction finally 
permitted them to be topped by Chicago 
skyscrapers in 1885. 

Great houses were built primarily of sand- 
stone masonry, but they depended on wood for 
the beams of roofs, doors and windows (Fig. 1). 
Around a dozen great houses were constructed 
over three centuries, which required huge 
quantities of wood: about 240,000 trees were 
used, yielding beams up to 5 metres long and 
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Figure 1 | Great houses. a, The monumental architecture of the Ancestral Puebloan culture, from between around ap 850 and 1140, is evident at the ruins in 
Chaco Canyon, New Mexico. b, Built of sandstone masonry, the ‘great houses’ incorporated large wooden beams for roofs, doors and windows, but the desert 
environment lacked large trees. Guiterman et al.’ used tree-ring-sourcing methods to identify the distant mountain ranges from which this wood was obtained. 


weighing up to 300 kilograms’. But big trees 
are not abundant near Chaco today, and were 
probably not in Ancestral Puebloan times 
either. Furthermore, some trees used in the 
construction were spruce and fir, which grow 
only in mountain forests at elevations much 
higher than Chaco. 

Previous studies had identified the wood 
as coming from the Chuska Mountains and 
Mount Taylor, which are both more than 75 km 
in a straight line from Chaco. These origins 
were determined by comparing the ratios of the 
strontium isotopes “Sr and “Sr in great-house 
beams with the ratios in trees from local moun- 
tain ranges”®. (The ratio varies between trees 
from different mountains, depending on dif- 
ferences in the age and mineral content of the 
underlying rock.) The studies assumed from 
local palaeoecology that the Chaco Basin itself 
was unforested during Ancestral Puebloan 
occupation. But the isotope-analysis results and 
that assumption were subsequently criticized’. 

Guiterman et al. turned to tree-ring sourc- 
ing for further evidence. In areas with strongly 
seasonal climates, tree wood displays annual 
growth rings. Tree growth, and hence ring 
thickness, differs from year to year because of 
annual differences in temperature and rainfall. 
This process underlies the familiar method of 
dendrochronology, better known as tree-ring 
dating: comparing the tree-ring pattern in an 
archaeological wood sample with the pattern 
in a tree sampled in a known year. But tree 
growth can also vary locally in a given year, 
because of local differences in climate and 
topography. That fact enables tree rings to be 
used to identify the source of an archaeological 
wood sample, by controlling for date and then 
comparing the sample's tree-ring pattern with 
patterns of wood from different local sources. 
Tree-ring sourcing has been used in Europe 
to identify sources of wood for ships, musical 


instruments and paintings’, but the method 
has received less attention in North America. 

The authors assembled tree-ring patterns 
from eight mountains located in a circle 
around Chaco. The patterns at the sites dif- 
fered enough for wood to be identified to 
individual sources. The researchers then com- 
pared patterns from those sites with patterns in 
170 beams of 6 tree species from 7 great-house 
structures. 

It turns out that the patterns of most beams 
matched those of trees from the Chuska 
Mountains (42%) and the Zuni Mountains 
(28%). Sourcing patterns differed somewhat 
between tree species. For spruce and fir, the 
species used in the first strontium-based stud- 
ies**, the source deduced by Guiterman et al. 
agreed with that previously identified. Inter- 
estingly, the main source shifted with time, 
from the Zuni Mountains before ap 1020 to 
the Chuska Mountains thereafter. The stron- 
tium studies had not sampled modern trees in 
the Zuni Mountains because of their greater 
distance from Chaco, but they agree with 
the tree-ring-sourcing results that the closer 
Chuska Mountains were the main source 
of ponderosa pine, the species used in most 
Chaco beams. No Chaco beams match the ring 
patterns of isolated stands of ponderosa pine 
nearby at Chaco’ elevation. 

Although Guiterman and colleagues’ study 
solves one mystery about Chaco, it brings 
others into focus. The first concerns the dis- 
tance of the wood sources. Archaeologically 
identified roads radiating from Chaco were 
presumably the transport routes, but how did 
people without draught animals transport 
5-m-long, 300-kg beams down from moun- 
tain forests 1,000 m higher than Chaco, and 
then 75km across land, while leaving almost 
no scratches on the wood? 

A second quandary stems from our 
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knowledge of other ancient construction pro- 
cesses. To build the enormous tall dome of the 
medieval cathedral in Florence, Italy, heavy 
weights were raised using a machine consist- 
ing of winches, pulleys, gears and wheels, and 
turned by oxen’. The Ancestral Puebloans 
lacked all five of these, so how did they raise 
long, heavy roof beams to heights of six storeys? 

Third, why did the wood source switch from 
the Zuni Mountains to the Chuska Mountains 
a century before Chaco was abandoned by 
its inhabitants? The switch coincided with 
a building boom at Chaco, when seven new 
great houses were erected, and pottery and 
stone tools also began to be imported from the 
Chuska Mountains. Might the shift have been 
caused by deforestation of the Zuni Mountains, 
by the closer location of the Chuska Moun- 
tains to Chaco or by social developments in 
the source areas? 

The seasonal climate, dry conditions and 
good preservation of archaeological wood 
in much of western North America make 
the region well suited for tree-ring-sourcing 
studies. Now that Guiterman et al. have dem- 
onstrated the method’s value at Chaco, we 
may hope for many more sourcing studies by 
archaeologists and historians. m 
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Rare isotopic insight 
into the Universe 


Light isotopes of hydrogen and helium formed minutes after the Big Bang. The 
study of one of these primordial isotopes, helium-3, has now been proposed as a 
useful strategy for constraining the physics of the standard cosmological model. 


NIKOS PRANTZOS 


he accepted theory of cosmology, 

known as the standard cosmological 

model, invokes the existence of a hot 
early Universe about 13.7 billion years ago. At 
that time, matter (elementary particles) and 
radiation (photons) coexisted as an essentially 
amorphous plasma from which nuclei, atoms, 
stars and galaxies progressively formed. The 
observation of ‘relics’ from that period, and 
their comparison with theoretical predictions, 
allowed the standard model to be established, 
and helps scientists to probe the physics of the 
Universe and to determine the 
values of its fundamental proper- 
ties. Writing in The Astrophysical 


Journal Letters, Cooke’ suggests c 
that observations of the abun- 4.0- 
dance of one such relic, the rare B 
helium isotope He, might pro- 3.5, 
vide information about the num- E 
ber of low-mass particle species go Ag E 
in the Universe, thus constrain- _- 
ing the standard model of nuclear [ 
and particle physics. oc 

The hot early Universe left [ 
two types of major relic: the 2.0 


cosmological relics depend on the physics of 
the early Universe. For instance, the abun- 
dances of primordial deuterium and *He 
depend sensitively on the density of normal 
(baryonic) matter at that time: the higher 
the density, the less deuterium and *He are 
produced, because they are more frequently 
destroyed by primordial nuclear reactions. 
Similarly, the morphology of the ripples 
detected in the CMB depends strongly on the 
cosmic baryonic density. 

The primordial abundance of “He is more 
sensitive to the expansion rate of the early 
Universe than to its baryonic density. That 


faint glow of microwave photons 
known as the cosmic microwave 
background (CMB), which is 
almost the same in all directions 
of the sky; and the light elements 
hydrogen and helium. These ele- 
ments consist of the abundant 
isotopes 'H and “He, and the 
rare ones, 7H (deuterium, also 
abbreviated to D) and *He. All 
of these isotopes were produced 
by a process called Big Bang 
nucleosynthesis (BBN), through 
nuclear reactions between pro- 
tons and neutrons during the 
first few minutes of the hot early 
Universe. 

According to theory, the pres- 
ently observed properties of the 
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Figure 1 | Constraining the parameters of the standard cosmological 

model. The abundances of nuclei produced during Big Bang nucleosynthesis 
essentially depend on two parameters: the density of normal (baryonic) matter, 
OQ, and the effective number of neutrino species, N,q. The values of Q3,, and 
N. can be constrained from measurements of the abundance ratio of deuterium 
to hydrogen (D:'H) in near-primordial environments (blue regions indicate 
constrained values obtained from D:'H ratios). Cooke’ proposes a different 
method for constraining these parameters, using measurements of the ratio of 
the yet-to-be-determined primordial abundances of helium-3 and helium-4 
isotopes (*He:*He; green regions indicate constraints based on measurements of 
>He:*He values for meteorites that formed at the same time as the Solar System, 
4.6 billion years ago). Taken together, the two approaches constrain Q,,, and 

¢« much more than can either individual approach (orange regions indicate 
combined constraints). Dark and light shades of the coloured regions 

indicate confidence limits of 68% and 95%, respectively. Q,,, is conventionally 
expressed as its product with h’, where h is the Hubble parameter divided by 100. 
(Figure adapted from ref. 1.) 
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rate depends, in turn, on the number density of 
photons and other relativistic particles, includ- 
ing electrons, positrons and three flavours of 
neutrino in the standard model of particle 
physics. The sum total of all those species is 
usually parameterized by an ‘effective number 
of neutrino species, N.g. In the standard model, 
N.¢ is 3.046, but its value can be different in 
non-standard models that predict the forma- 
tion of new particle species. 

Theoretical predictions of BBN have 
improved considerably over the years, and all 
of the relevant nuclear reaction rates have been 
measured in the laboratory”. But comparison 
of these predictions with observations requires 
the primordial abundances of the light nuclei 
to be reliably established, which is difficult to 
do. After more than 13 billion years of cosmic 
evolution, the abundances of all elements in 
the Universe have been altered by the work- 
ings of stars: those of 'H and deuterium are 
reduced compared with primordial abun- 
dances, because stars ‘burn’ these isotopes in 
nuclear reactions, whereas the abundances 
of all other isotopes have steadily increased 
because they are produced by stars. Regions of 
the Universe that have evolved very little must 
therefore be sought if primordial abundances 
are to be established. 

In the case of deuterium, which 
is the most sensitive chemical 
probe of baryonic density, obser- 
vations are made in remote gas 
clouds more than 10 billion light 
years (about 3 billion parsecs) 
away, and therefore more than 
10 billion years old. The low 
content of ‘metals’ (defined by 
astronomers as elements heav- 
ier than helium) in such clouds 
ensures that their composi- 
tion is barely affected by stellar 
activity. The observed isotopic 
ratio of deuterium to 'H shows 
little variation around the aver- 
age observed value’, and points 
to a baryonic density of 4.5% of 
the critical cosmic density (the 
density value that determines 
whether the Universe is open — 
expanding forever — or closed). 
This is in excellent agreement 
with the value determined from 
the latest CMB observations by 
the European Space Agency’s 
Planck mission’. 

“He is conventionally used as a 
probe of N.g. The abundance of 
this isotope is measured through 
the intensity of its emission 
lines in the gas spectra of nearby 
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50 Years Ago 


In the market of Fort Lamy (Chad) 
one can purchase a greenish edible 
substance called Dihe which is sold 
asa flat cake. ... It appears to be 
an alga collected on the bottoms 
of seasonally dried-up ponds and 
shallow waters in the north of Lake 
Chad and consumed by the local 
population. 
However, on arriving in... 
Ounianga Kebir ... more than 
750 miles to the north-east of Fort 
Lamy, the botanist was struck by the 
abundance of a microscopic alga in 
some lakes. ... Although the local 
population appears to be unaware 
that it might have a food value, the 
botanist ... prepared some cakes 
according to the recipe obtained. 
Both cakes ... are almost 
exclusively composed of a 
Cyanophycea: Spirulina platensis. 
According to chemical analysis it 
appears that it is a food-plant very 
rich in proteins. 
From Nature 8 January 1966 


100 Years Ago 


The popularisation of Science. 
It is scarcely surprising that 
scientific knowledge is so little 
disseminated in this country 
considering the difficulties 
which hinder its acquisition. If 
science is to become widespread, 
it seems to me essential that it 
should be democratic both in its 
higher and in its lower branches. 
In England, however, science 
may be said to be aristocratic. 
Scientific societies demand more 
or less high subscriptions. Public 
lectures on science are rarely free. 
In London an institution exists 
where advanced lectures are given, 
but the subscription to which 

is considerable, and to become 
members of which people actually 
have to be recommended— 
recommended to be allowed to 
learn! 

From Nature 6 January 1916 
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galaxies that have low metal content, but those 
measurements are affected by systematic uncer- 
tainties. Even worse, the latest analyses point to 
a primordial “He abundance that seems to be 
significantly higher than the one suggested by 
the Planck mission's CMB study”®. 

To resolve this problem and to reduce the 
uncertainties, Cooke proposes that the usu- 
ally neglected primordial isotope *He should 
be included in the analyses. According to BBN 
theory, the ratio of the primordial abundance 
of *He to that of “He depends on both N,; and 
the cosmic baryonic density, in a way that is 
opposite to the dependence of the ratio of deu- 
terium to 'H; that is, *He:*He decreases with 
Np whereas D:'H increases. So, by combining 
analyses of both the hydrogen and helium iso- 
topic ratios, the value of N.,,can be constrained 
better than by using either the abundance of 
“He or the D:'H ratio alone (Fig. 1). 

Implementing this idea is far from trivial, 
however, on both observational and theoreti- 
cal grounds. First, uncertainties in nuclear- 
reaction rates will have to be further reduced 
to make *He a useful probe for precision 
cosmology. Second, unlike deuterium, which 
is always destroyed by stars, “He is produced by 
low-mass stars but destroyed by higher-mass 
ones, to a poorly known extent. This makes 
it difficult to determine its primordial abun- 
dance unambiguously, even by looking in low- 
metallicity environments. 

Moreover, *He is 10,000 times less abun- 
dant than “He, and so its weak emission line 


will be hard to identify in the background of 
the much brighter *He line — especially if the 
latter is broadened by rapid thermal or turbu- 
lent motions of the emitting gas. A statistically 
significant detection of *He would require a 
high signal-to-noise ratio, of more than 500. 
This will be obtainable only using the next gen- 
eration of telescopes, which will have mirrors 
30 metres or more in diameter. 

Nevertheless, Cooke’s suggestion is of great 
interest, because the standard cosmologi- 
cal model should be checked as accurately as 
possible with every available method, in view 
of its prominent role in modern physics. In 
particular, Cooke’s strategy should allow 
potential departures from the standard model 
to be probed in a complementary way to 
existing strategies. m 
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Oncogene brought 


into the loop 


Analysis of the 3D structure of DNA in tumour cells reveals how mutations in the 
IDHI gene, and associated changes in methyl groups attached to DNA, elevate the 
expression of cancer-promoting genes. SEE LETTER P.110 


MATTHEW R. GRIMMER 
& JOSEPH F. COSTELLO 


" vn discovery in the late 2000s that 
mutations in the gene that encodes the 
enzyme isocitrate dehydrogenase 1 

(IDH1) are often associated with glioma, 

the most common form of brain cancer, was 

unexpected and tantalizing’’. The IDH1 pro- 

tein is involved in the citric-acid cycle —a 

metabolic process that is used by nearly all 

cells to generate energy, and that in 2008 had 
only recently been connected to cancer™. 

The discovery therefore supported the long- 

standing theory that altered metabolism 

could transform normal cells into cancerous 
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ones. On page 110 of this issue, Flavahan 
et al.” report that an abnormal metabolite 
generated by mutant IDH1 may drive cancer 
primarily by altering the 3D conformation 
of DNA. 

Mutant IDH1 converts the citric-acid- 
cycle molecule isocitrate into an abnormal 
metabolite that inhibits TET enzymes’, 
which remove methyl groups from DNA. 
The presence of methyl groups can alter 
gene expression by preventing some proteins 
from binding DNA, and an excess of methyl 
groups in promoter sequences (which drive 
gene expression) can silence tumour-sup- 
pressor genes, leading to cancer. It has been 
suggested’ that inhibition of TET enzymes 


leads to such hypermethylation in IDH1- 
mutant tumours. However, promoter hyper- 
methylation in these tumours is not generally 
correlated with changes in gene expression’, 
suggesting that cancer-associated changes 
in methylation may occur at other DNA 
sequences. 

In addition to promoter regions, gene 
expression can be regulated by the 3D struc- 
ture of chromatin (the complex in which 
DNA is wound around histone proteins for 
packaging in the cell). Chromatin structure 
is exceptionally intricate, and is defined in 
part by evolutionarily conserved loops called 
topologically associated domains (TADs). 
Interactions between DNA sequences — for 
instance, those that bring promoters into 
contact with distant enhancer elements to 
activate gene expression — are more com- 
mon within than between TADs, and there is 
evidence’ that gene expression is coordinated 
in these loops. 

TADs are insulated from one another by 
DNA-binding proteins such as the CCCTC- 
binding factor (CTCF). Deletion of the DNA 
sequence encompassing one CTCF binding 
site has been shown to cause changes in TAD 
structure and gene expression that lead to 
limb malformations”, highlighting the 
importance of maintaining these bounda- 
ries. Notably, CTCF binding is sensitive to 
changes in DNA methylation'’”. 

Flavahan et al. demonstrated that a subset 
of CTCF binding sites is methylated in IDH1- 
mutant gliomas, and that CTCF binding at 
these sites is subsequently reduced. Lever- 
aging gene-expression data from hundreds 
of gliomas and normal brain specimens, 
and using 3D chromosome-conformation 
data from various cell lines, the authors 
found previously unknown gene-expression 
correlations between TADs in IDH1-mutant 
gliomas, suggesting that TAD borders are 
disrupted. 

Hundreds of the pairs of genes that are 
correlated in the mutant cells straddle a dis- 
rupted TAD border. Of these, PDGFRA and 
FIPIL1 are among the most highly expressed. 
PDGERA is an appealing candidate for fur- 
ther study, because it is a well-documented 
oncogene (it promotes cancer when muta- 
tionally activated or overexpressed) and 
is amplified genetically in some 20% of 
advanced (high-grade) gliomas’’. The 
authors find that, in JDH1-mutant glio- 
mas, which are low grade, the CTCF site at 
the TAD boundary between PDGFRA and 
FIP1L1 is methylated and CTCF binding 
is reduced. Thus, an increase in PDGFRA 
expression, although arising through dif- 
ferent mechanisms in low- and high-grade 
tumours, may be a common theme in glioma. 

Flavahan and colleagues showed that, in 
glioma cells in which IDH1 is not mutated, 
the PDGFRA promoter strongly interacts with 
its own enhancer. The interaction patterns are 
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Figure 1 | Breaking down boundaries to cancer. Structural boundaries between regions of chromatin 
(the complex of DNA and proteins in which DNA is packaged in the nucleus) define loops called 
topologically associated domains (TADs), within which gene activity is coordinated. DNA binding by the 
insulator protein CTCF separates these domains. Flavahan et al.° provide evidence that CTCF insulation 
prevents the activation of oncogenes (genes whose hyperactivity promotes cancer) by distant enhancer 
elements from different TADs. The authors find that mutations in the gene IDH1 increase the number of 
methyl groups that are attached to CTCF binding sites, reducing CTCF-DNA binding. This breaks down 
the TAD border structure, allowing aberrant association between enhancers and the promoter regions of 
oncogenes. Oncogene expression is subsequently amplified, leading to cancer. 


markedly different in [DH1-mutant tumours. 
Here, there is a strong interaction between 
the PDGFRA promoter and the unrelated 
enhancer of FIP1L1, despite the fact that these 
two genetic elements are separated by almost 
900,000 base pairs. This aberrant interaction 
is approximately five times stronger than that 
between the PDGFRA promoter and its own 
enhancer. Together, these results suggest that 
disruption of a boundary element by hyper- 
methylation allows a potent FIPIL1 enhancer 
to interact with the PDGFRA promoter, 
increasing gene expression (Fig. 1). 

To confirm that DNA hypermethylation is 
responsible for the elevated PDGFRA expres- 
sion that they observed, the authors treated 
IDH1-mutant cells with a drug that reduces 
DNA methylation. In agreement with their 
hypothesis, the treatment reduced methylation 
of the relevant CTCF binding site, increasing 
CTCF binding and reducing PDGFRA expres- 
sion. Conversely, experimental disruption of 
the CTCF binding site in cells that lacked the 
IDH1 mutation led to increased PDGFRA 
expression. The altered expression presum- 
ably occurs because of changes in enhancer- 
promoter interactions, but this was not tested 
directly. Elevated PDGFRA expression dou- 
bled cell growth compared with untreated 
cells. This suggests that the increased PDGFRa 
protein in IDH1-mutant glioma cells provides 
a selective growth advantage over cells lacking 
the mutation. 

Flavahan and colleagues’ study focuses on 
one CTCE site out of hundreds, so other onco- 
genes might also be activated by newly formed 
enhancer-promoter interactions in IDH1- 
mutant tumours. Many newly activated genes 
may also be ‘passenger’ events, which have no 
functional consequences. The methylation 
states of CTCF sites and the activity of enhanc- 
ers vary widely across cell types, suggesting 
that 3D chromosome-conformation analysis 
of high-grade gliomas, colorectal cancers, lym- 
phomas, leukaemias and other JDH1-mutant 


cancers could reveal different targets of 
genomic hypermethylation. These targets may 
also include those normally bound by methyla- 
tion-sensitive factors other than CTCE. 

Consistent with the fact that DNA methyl- 
ation is highly stable, aberrant hypermeth- 
ylation persists in IDH1-mutant tumours after 
treatment with an inhibitor of mutant IDH1 
(ref. 14). Assuming that hypermethylation 
is involved in the transition to cancer, as is 
strongly suggested by the current study, such 
stability could pose a challenge for the success of 
IDH1-inhibitor treatments in patients. Unrav- 
elling the effects of DNA hypermethylation 
on gene dysregulation will lead to a more com- 
plete survey of the forces downstream of TET 
and other enzymes that drive the evolution of 
IDH1-mutant cancer cells’. Flavahan and col- 
leagues’ study provides a fresh perspective on 
which to base such future analyses. m 
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Autophagy maintains stemness by 
preventing senescence 


Laura Garcfa-Prat!, Marta Martinez-Vicente?*, Eusebio Perdiguero!, Laura Ortet', Javier Rodriguez-Ubreva*, Elena Rebollo*, 
Vanessa Ruiz-Bonilla!, Susana Gutarra!, Esteban Ballestar*, Antonio L. Serrano!, Marco Sandri>»** & Pura Mufioz-Canoves!” 


During ageing, muscle stem-cell regenerative function declines. At advanced geriatric age, this decline is maximal 
owing to transition from a normal quiescence into an irreversible senescence state. How satellite cells maintain 
quiescence and avoid senescence until advanced age remains unknown. Here we report that basal autophagy is 
essential to maintain the stem-cell quiescent state in mice. Failure of autophagy in physiologically aged satellite cells 
or genetic impairment of autophagy in young cells causes entry into senescence by loss of proteostasis, increased 
mitochondrial dysfunction and oxidative stress, resulting in a decline in the function and number of satellite cells. 
Re-establishment of autophagy reverses senescence and restores regenerative functions in geriatric satellite cells. As 
autophagy also declines in human geriatric satellite cells, our findings reveal autophagy to be a decisive stem-cell-fate 
regulator, with implications for fostering muscle regeneration in sarcopenia. 


The regenerative capacity of skeletal muscle relies on long-lived Pax7- 
expressing muscle stem cells (called satellite cells), which are normally 
in quiescence (a GO reversible arrest state). In response to tissue dam- 
age, these cells activate, enter the cell cycle and either expand and 
form new myofibres or self-renew to restore the quiescent satellite cell 
pool. Quiescence therefore appears to be a simple way of functionally 
maintaining the stem-cell population throughout life in the absence of 
regenerative demand, particularly in tissues with little turnover, such 
as skeletal muscle. 

Sarcopenia, the age-related loss of skeletal muscle mass and function, 
is maximal at geriatric age. At this last stage of life, skeletal muscle 
shows a profound regenerative impairment that contributes to the indi- 
vidual’s physical incapacitation. Both changes in the environment (such 
as inflammatory status) and/or satellite-cell-intrinsic mechanisms asso- 
ciated to ageing may contribute to this regenerative decline*®. Recent 
studies have demonstrated that aged skeletal muscles fail to retain stem- 
cell quiescence’~’. Both the number and the functionality of muscle 
stem cells decline with ageing”~', with satellite cells switching from 
a quiescence to a pre-senescence state in sarcopenic muscle at geriat- 
ric age®. How satellite cells maintain quiescence during their long life 
and avoid acquisition of the senescence program until advanced age 
is largely unknown. 

Using physiologically aged mice, we show that quiescent muscle 
stem cells preserve their integrity over time through active mainte- 
nance of organelle and protein homeostasis (proteostasis) as a cellular 
quality control mechanism. We demonstrate that these dormant stem 
cells display continuous basal macroautophagy (hereafter referred to 
as ‘autophagy’; that is, the process for degradation of long-lived pro- 
teins and damaged organelles in lysosomes’*"»). This activity declines 
during ageing. Physiological decline of autophagy in old satellite cells 
or its genetic impairment in young cells, results in toxic cellular waste 
accumulation, resulting in entry into senescence. 

Our studies indicate that muscle stem cells preserve their 
G0-reversible quiescence state from entering a GO-irreversible 


senescence state through autophagy. Genetic and pharmacologi- 
cal regimes that reinstall basal autophagy in geriatric mice reversed 
stem-cell senescence and restored regeneration, which has implications 
for the use of regenerative medicine in sarcopenia. 


Impaired autophagy in aged satellite cells 

We interrogated the transcriptomes of quiescent satellite cells com- 
pared to activated cells for changes in proteostasis genes'®'® and 
uncovered autophagy as the most prevalent pathway in the quiescent 
state (Extended Data Fig. la and Supplementary Table 1). K-means 
clustering analysis revealed an age-associated downregulation of auto- 
phagic genes in quiescence (Extended Data Fig. 1b and Supplementary 
Table 1). 

Autophagy is an evolutionary conserved process of self-degradation 
of cellular components (organelles, cytosol portions and misfolded 
proteins) by autophagosomes, which are delivered to the lysosomal 
machinery, thus preventing waste accumulation", and this process 
has been implicated in ageing of different model organisms!*'>!9°, To 
investigate the occurrence of autophagy in quiescent muscle stem cells 
we used green fluorescent protein (GFP)-LC3 (a well-known marker 
of autophagosomes) transgenic mice*!””. Quiescent satellite cells were 
isolated by fluorescence-activated cell sorting (FACS) (Extended Data 
Fig. 1c) from the resting muscle of young (3 months) and old (20-24 
months) GFP-LC3 mice. Punctate GFP-LC3 signal was found in young 
cells, and this was increased in old cells (Fig. 1a, Extended Data Fig. 1d 
and Supplementary Videos 1 and 2). We next used the autophagy-flux 
inhibitor bafilomycin, which prevents lysosome degradation, thus 
increasing punctate GFP-LC3 exclusively when autophagy is active’. 
Bafilomycin treatment demonstrated that—in contrast to the result 
for young cells—old satellite cells lacked the capacity for further auto- 
phagosome formation, as monitored by GFP-LC3 fluorescence levels 
(Fig. 1b). These results indicate constitutive autophagic activity in 
young quiescent satellite cells and impaired autophagic activity during 
ageing. Fluorescence, transmission-electron microscopy and western 
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Figure 1 | Altered basal autophagy in ageing muscle stem cells. 

a, Number and area of punctate GFP-LC3 in quiescent satellite cells. 
Arrowheads, autophagic vesicles. b, Autophagy flux in cells from a. Cells 
were treated with vehicle or bafilomycin (+Baf) for 4h before analysis. 
The change in mean fluorescence intensity (MFI) of GFP-LC3 in cells 
treated with bafilomycin is shown. c, Autophagy flux in quiescent satellite 
cells from old GFP-LC3 mice treated for two weeks with rapamycin or 
vehicle control. Satellite cells + bafilomycin treatment as in b. d, p62 and 
ubiquitin (Ub) MFI from cells of the GFP-LC3 mice treated as in c. AU, 
arbitrary units. Data show mean + s.e.m. Comparisons by two-sided 
Mann-Whitney U-test. Sample numbers were n= 51 (young), n = 106 
(old) cells analysed from 3 animals for a; n = 60,000 cells from 3 animals 
for b; n = 60,000 cells from 3 animals for c; n = 36 (control), 39 (rapamycin 
treated) cells from 3 animals for d. The z projections of representative 
images are shown. Scale bars are all 1.5 j1m, apart from 5\1m for c. 
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blotting analyses indicated common traits of deficient autophagy in 
old satellite cells, including the accumulation of autophagic vesicles 
(Extended Data Fig. le, f), aggregates forming of p62 (a protein reg- 
ulating autophagic clearance of dysfunctional organelles or aggre- 
gates), ubiquitin (Ub)-positive inclusions (Extended Data Fig. 1g), 
reduced LC3II accumulation after bafilomycin treatment (Extended 
Data Fig. 1h). A two-week-treatment in old mice with rapamycin (or 
spermidine), well-known autophagy-inducing regimes”*”°, restored 
basal autophagy in stem cells (Fig. 1c, Extended Data Fig. li and 
Supplementary Videos 3 and 4) and reduced protein and organelle 
aggregates (Fig. 1d and Extended Data Fig. 1)). 


Restoring autophagy prevents senescence 

Satellite cells enter a senescent state when they reach a geriatric age 
(over 28 months in mice)®”°. We investigated whether dysregulated 
basal autophagy may underlie the loss of bona fide quiescence. Using 
an mRFP-GFP-LC3 construct” (a tandem fluorescent-tagged LC3 
reporter containing monomeric red fluorescent protein (mRFP) and 
GFP), transfected into young, old and geriatric satellite cells, analys- 
ing these samples in combination with bafilomycin treatment, we 
found a higher blockade of autophagic flux in geriatric than old cells, 
with respect to young cells (the blockade is geriatric > old > young). 
In the absence of bafilomycin, red LC3 puncta (mature autolyso- 
somes) were only abundant in young cells. Bafilomycin treatment 
induced yellow LC3 puncta (non-fused autophagosomes) accumu- 
lation in young cells, which was blunted in old and geriatric cells 
(Fig. 2a). Geriatric satellite cells also showed increased co-localization 
of p62-ubiquitin aggregates in non-degraded autophagosomes 
(Fig. 2b). As p62 marks damaged organelles for degradation by 
selective autophagy, whereas ubiquitin marks substrates for their 
degradation by either the ubiquitin-proteasome system (UPS) or 
selective autophagy, the increased signal of both proteins and their 
co-localization demonstrates that the autophagic defect in these cells 
is due, at least in part, to a block in autophagosomal or lysosomal 
clearance. 
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Figure 2 | Defective autophagy causes numerical and functional satellite 
cell decline in ageing. a, The mRFP-GFP-LC3 plasmid was transfected 
into young (3 months), old (24 months) and geriatric (28 months) 

satellite cells to enable the detection of autophagosomes (yellow) and 

their maturation into autolysosomes (red) in the presence or absence of 
bafilomycin treatment (as in Fig. 1b). The graph indicates the percentage 
of double-positive puncta (RFP*/GFP*) (autophagosomes) out of total 
puncta (RFP*/GFP*, RFP*, autophagosomes and autolysosomes). 

b, Quantification of p62 and ubiquitin aggregates in quiescent satellite 
cells from a. Co-localization staining area with respect to total cellular 
area. Pearson's coefficient (r) indicates the correlation of intensity values 
of red and green pixels in dual-channel images. Arrowheads indicate 
co-localization. c, An equal number of LV-GFP-infected satellite 

cells from young or geriatric mice, treated for 48 h with rapamycin or 
LV-Atg7 infected (or controls), were transplanted into an injured mouse 
muscle, and analysed 4 or 28 days later (for analysis on day 28, 

see Extended Data Fig. 2i, j). Analysis on day 4 of GFP and Pax7 
immunostaining. Quantification of GFP* cells per muscle field. Values 
relative to transplanted young cells (100%). Data show mean + s.e.m. 
Comparisons by two-sided Mann-Whitney U-test. The sample numbers 
were n= 21 (young), n= 19 (young, +-bafilomycin), n = 30 (old), n= 15 
(old, +bafilomycin), n = 21 (geriatric) and n= 15 (geriatric, +-bafilomycin) 
cells analysed from 3 animals for a; n = 35 (young), n= 66 (old) and 

n= 104 (geriatric) cells from 3 animals for b; n =5 engraftments per group 
for c. Representative images are shown. The z projections of representative 
images are shown. Scale bars are all 51m, apart from 50\1m for c. 
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To investigate whether restoring autophagy could rescue the cell-in- 
trinsic irreversible cell cycle and regenerative block of geriatric cells, 
we engrafted freshly isolated GFP-labelled young and geriatric satellite 
cells (pre-treated with rapamycin or the control vehicle) into pre-injured 
muscles of young recipient mice. Autophagy reactivation significantly 
restored expansion of geriatric cells (expressing Pax7, Ki67, MyoD or 
myogenin (Mgn, also known as Myog)) after a four-day engraftment 
(Fig. 2c and Extended Data Fig. 2a-c) and prevented senescence 
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(geroconversion), as shown by p 16'NK4 and ~H2AX reduction (Extended 
Data Fig. 2d). Rapamycin (or spermidine) treatment also decreased 
geriatric senescent cells (senescence-associated 3-galactosidase- 
positive (SA-3-gal*)) (Extended Data Fig. 2e, f) and re-established 
proliferation (Extended Data Fig. 2e). A genetic approach to enhance 
autophagy, by overexpressing Atg7 (crucial for autophagosome for- 
mation) (Fig. 2c and Extended Data Fig. 2g, h) rescued the prolif- 
erative defect, while reducing senescence (Extended Data Fig. 2e). 
Furthermore, satellite cell transplantation and whole-muscle graft 
experiments demonstrated that the introduction of Atg7 alone in ger- 
iatric satellite cells rescued their intrinsic regenerative capacity, allow- 
ing the formation of new muscle fibres (Fig. 2c and Extended Data 
Fig. 2i, j). 


Atg7 loss causes stem-cell senescence 

To investigate if basal autophagy disruption causally breaks quiescence, 
we intercrossed Atg7-floxed mice with Pax7-Cre and Pax7-Cre™® mice, 
to impair autophagy in Pax7-expressing cells either constitutively 
(Atg74? ®x7) or inducibly (Atg7“? ax7ER) after tamoxifen administra- 
tion. Intercrossing Atg7*?**’ with GFP-LC3 mice (Atg74?*7:GFP- 
LC3) resulted in the loss of autophagosomes in quiescent Atg7 null 
satellite cells (Extended Data Fig. 3a, b). The satellite cell pool was 
severely reduced in Atg7?**’ mice (Fig. 3a and Extended Data Fig. 3c). 
Tamoxifen administration to three-month-old Atg74?*7#® mice led to 
satellite cell loss after 30 days (Fig. 3b), indicating that basal autophagy is 
required for both establishment and maintenance of the adult quiescent 
stem-cell population. The remaining Atg7?**/ satellite cells showed 
unexpected signs of premature ageing including induction of p16N**, 
p21?! and plsiNk+ and DNA damage (yH2AX* cells) (Fig. 3c, d 
and Extended Data Fig. 3d). Atg7“"**” satellite cells did not undergo 
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Figure 3 | Genetic impairment of autophagy disrupts satellite 

cell homeostasis. a, Satellite cell quantification by analysis of Pax7 
immunostaining of muscles in three-month-old Atg7“' and Atg74?**7 
mice. b, Schematic of the mouse tamoxifen (Tmx) treatment and satellite 
cell analysis. Satellite cell quantification in Atg71/Atg7*?**7ER mice 

on day 30 after tamoxifen treatment as in a. c, RT-qPCR of senescence 
markers (7 days after tamoxifen treatment) in cells from b. d, The 
percentage of co-localizing YH2AX* cells out of the total Pax7* cells in a. 
e, f, Quantification of BrdU* (e) and SA-B-gal* (f) cells isolated from 
mice in a. The arrowhead indicates positive staining. g, Western blotting 
of cells from a; for a full scan of the gel see Supplementary Fig. 1. pS6, 
phosphorylated S6. Data show mean + s.e.m. Comparisons by two-sided 
Mann-Whitney U-test. Sample numbers were: n =5 animals per group for a; 
n=3 animals per group for b-g. Representative images are shown. The 

z projections of representative images are shown. Scale bars, 250 1m. 
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mitotic or myogenic differentiation pathways (Extended Data Fig. 3e). 
Thus, loss of autophagy with ageing may be the cause underlying the 
age-associated numerical decline in muscle stem cells”*!°"¥°, 

In response to muscle injury, Pax7* cells from young Atg7“?*” 
mice showed reduced activation and expansion capacity (Fig. 3e and 
Extended Data Fig. 3f), and accelerated entry into deep senescence*8 
(geroconversion”’) in vivo and in vitro, as demonstrated by SA-(- 
gal-staining (Fig. 3f), and increased expression of yH2AX and phos- 
phorylated S6, and also evidence of regenerative failure, shown by 
reduced cell proliferation and decreased size of regenerating fibres 
(Fig. 3g and Extended Data Fig. 3g-l). Confirming the cell-intrinsic 
regenerative failure, fewer GFP* fibres derived from Afg7 null satel- 
lite cells were found in transplantation experiments (Extended Data 
Fig. 3m, n), and this failure could not be rescued by rapamycin 
(or spermidine) (Extended Data Fig. 3m-o). 


Altered mitophagy and increased ROS induce senescence 
We next investigated how loss of autophagy in young quiescent satellite 
cells induced premature ageing. Genetic disruption of autophagy in 
satellite cells resulted in a similar phenotype to that observed in aged 
cells with rapid accumulation of p62 and ubiquitin-positive aggregates, 
and also mitochondria and lysosomes (MitoTracker, and LysoTracker 
and Lamp1) (Fig. 4a, b and Extended Data Fig. 4a). There was also a 
lower proportion of healthy mitochondria in old (and Atg74?®”F8) sat- 
ellite cells, as revealed by reduced membrane potential (a lower mean 
fluorescence intensity ratio of the active mitochondria labelling fluo- 
rescent dye TMRM to MitoTracker green) (Fig. 4a, b). Furthermore, 
mitophagy (the cellular capacity to clear damaged mitochondria by 
autophagy) was defective in geriatric satellite cells, as indicated by 
mitochondria accumulation inside autophagosomes or lysosomes 
(through co-localization of mitochondrial TOM20 and lysosomal 
Lamp-1 markers) (Extended Data Fig. 4b). In vivo rapamycin (or sper- 
midine) treatment of geriatric mice restored mitophagy in satellite cells 
(Extended Data Fig. 4b-f). Consistent with age-impaired mitophagy, 
young, but not geriatric, cells were capable of eliminating carbonyl 
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Figure 4 | Autophagy loss results in mitochondrial dysfunction and 
accumulation of organelles, proteins and ROS. a, Lysosomal (Lamp1 and 
LysoTracker) and mitochondrial (MitoTracker) quantification in satellite 
cells from Atg77 and Atg74?#*’E® mice, one month after tamoxifen 
treatment. Membrane potential monitored as the mean fluorescent 
intensity ratio of TMRM to MitoTracker Green. b, Similar quantification 
as in a for young and old satellite cells. Membrane potential analysis as in a. 
Data show mean + s.e.m. Comparisons by two-sided Mann-Whitney 
U-test. The sample numbers were n = 60,000 cells analysed from 

3 animals for a and b. Representative images are shown. The z projections 
of representative images are shown. Scale bars, 51m. 
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Figure 5 | ROS inhibition prevents senescence in aged satellite cells. 

a, ROS quantification in young and old satellite cells by CellROX flow 
cytometry. b, Mitochondria (MitoTracker) and ROS (CellROX) in satellite 
cells, + 24h bafilomycin treatment. Results represent increased MFI in 
the presence of bafilomycin. Representative images of young satellite cells. 
c, RT-qPCR of senescence markers + Trolox. d, Quantification of BrdUT 
and SA-(3-gal* cells from c, pretreated with Trolox (or controls that were 
not treated with Trolox), and cultured for 96h. e, Western blotting in 

cells + Trolox; for full-scan of gel see Supplementary Fig. 1. f, Geriatric 
cells + Trolox 48 h pre-treatment, were transplanted and analysed as in 
Fig. 2c. Data show mean + s.e.m. Comparisons by two-sided Mann- 
Whitney U-tests. Sample numbers were n = 60,000 cells analysed from 

3 animals for a and b; n = 3 animals per group for c-e; n= 4 engraftments 
per group for f. Representative images are shown. The z projections of 
representative images are shown. Scale bars are all 51m apart from 

50m for f. 


cyanide 3-chlorophenylhydrazone (CCCP)-damaged mitochondria 
(Extended Data Fig. 4d, e). 

Next, we analyzed how altered mitophagy led to satellite cell senes- 
cence with ageing. We detected higher levels of reactive-oxygen spe- 
cies (ROS), parkin (marking damaged mitochondria for degradation 
by mitophagy), and DNA-damage markers in Atg7-deficient satellite 
cells (Fig. 3d, g and Extended Data Figs 3h and 5a, b), associated with 
pl6'N** and pS6 induction (Fig. 3g and Extended Data Figs 3g and 
4h). Higher ROS labelling and ROS-mitochondria co-localization 
were also observed in geriatric satellite cells, correlating with impaired 
mitophagic flux (Fig. 5a and Extended Data Fig. 4g). Bafilomycin- 
induced autophagy block caused greater mitochondrial accumula- 
tion in young cells, compared with geriatric and Atg7“?*7FR cells, 
paralleling the ROS increase (Fig. 5b). To address the role of ROS in 
impaired autophagy, we inhibited it with Trolox (a vitamin E ana- 
logue) (Extended Data Fig. 5c). Trolox treatment of old GFP-LC3 
mice increased GFP-LC3 puncta (after bafilomycin treatment) and 
reduced p62 and ubiquitin aggregates and mitochondria—ROS co- 
localization in GFP-LC3 satellite cells (Extended Data Figs 4g and 5d). 
Attenuation of autophagic block by ROS inhibition was further con- 
firmed in bafilomycin-treated aged cells through LC3-I accumulation 
(Extended Data Fig. 5e, f) and an mRFP-GFP-LC3 tandem reporter, 
which detected reduced autophagosomes (RFP*/GFP* puncta) and 
rescued autophagic flux (Extended Data Fig. 5g). Trolox treatment 
prevented the appearance of senescence markers (Fig. 5c-e), restored 
the expansion (Fig. 5d), and rescued the cell-intrinsic proliferative 
and regenerative defect of geriatric satellite cells after transplantation 
(Fig. 5f and Extended Data Fig. 5h). Thus, increased ROS, resulting 
from impaired autophagy, drive satellite cell senescence in aged cells. 
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Figure 6 | Epigenetic control of p16“ expression by ROS in autophagy- 
impaired satellite cells. a, Chromatin immunoprecipitation (ChIP) for 
K119 ubiquitination of H2A (H2Aub) in geriatric satellite cells, in the 
presence or absence of 48h of Trolox treatment. RD, regulatory domain. 

b, H2Aub-ChIP for Atg77 and Atg7*?*” satellite cells treated as in a. 

c, Western blotting for cells in b; for full gel scan see Supplementary Fig. 1. 
Data show mean +s.e.m. Comparisons by two-sided Mann-Whitney U-test. 
P values indicated. Sample numbers were n = 3 animals per group for a-c. 


Loss of the polycomb repressive complex-1 (PRC1)-mediated H2A 
monoubiquitination of lysine 119 (H2Aub) at INK4a (also known as 
Cdkn2a) locus drives p16'* induction in geriatric satellite cells® 
(Extended Data Fig. 5i). We found that Trolox treatment restored 
INK4a locus H2Aub modification in geriatric and Atg7-deficient 
satellite cells (Fig. 6a, b), resulting in pl6NK4a repression, and this 
reduced senescence while promoting proliferation (Figs 5c—f and 6c 
and Extended Data Fig. 5j). Genetic silencing (with short-hairpin 
RNA) of INK4a restored proliferation in Atg7?**” satellite cells 
while reducing the expression of senescence-associated genes and 
the number of SA-(-gal* cells, and augmenting their regenerative 
capacity (Extended Data Figs 5k and 6a, b). Thus, the ROS-induced 
p16! axis links impaired autophagy and senescence in ageing 
satellite cells. 


Defective autophagy in aged human cells 

Skeletal muscles from geriatric individuals show sarcopenia and pres- 
ence of senescent satellite cells (Extended Data Fig. 7a, b)®. As in mice, 
human satellite cells from geriatric individuals showed defective pro- 
tein and organelle clearance, as indicated by p62 and mitochondrial 
accumulation (Extended Data Fig. 7c, d) compared to young cells, 
which was tightly associated with increased ROS levels (Extended Data 
Fig. 7d, e) and SA-8-gal* cells (Extended Data Fig. 7f), consistent with 
reduced proliferative potential (Extended Data Fig. 7g). The causal role 
of impaired autophagy on the geroconversion of ageing human sat- 
ellite cells under proliferative pressure was supported by the capacity 
of rapamycin to revert the abnormal mitochondrial content, protein 
aggregates and ROS (Extended Data Fig. 7c, d), and senescence phe- 
notype (Extended Data Fig. 7f-i). Thus, restoration of autophagy and 
organelle homeostasis in aged human satellite cells suffices to rescue 
senescence, as is the case in murine satellite cells. 


Discussion 

In tissues with little turnover, reversible quiescence is the normal 
stem-cell state throughout life. However, quiescence is known to be 
progressively lost ee ageing due to systemic/niche- and intrinsic- 
factor alterations”’. Recent studies showed that at geriatric age, the 
normal stem-cell quiescent state is substituted by an irreversible senes- 
cence state, which results in a numerical and functional decline of stem 
cells®. The mechanisms accounting for the maintenance of quiescence, 
preservation of the stem-cell pool and prevention of senescence during 
an individual's life remain unknown. Our results demonstrate that 
quiescent satellite cells are equipped with cytoprotective and cellular 
quality-control mechanisms that actively repress the senescence pro- 
gram, thereby preserving the integrity and fitness of cells. We provide 
evidence of loss of autophagy in satellite cells that occurs with ageing, 
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resulting in accumulation of damaged proteins and organelles, leading 
to senescence and stem-cell exhaustion. Consistent with this finding, 
genetic inhibition of autophagy specifically in satellite cells of young 
mice caused rapid entry into senescence, resulting in numerical and 
functional exhaustion of stem cells, and defective muscle regeneration. 
These findings were surprising, considering that a decline in basal 
autophagy in quiescent stem cells of physiologically aged mammalian 
organisms has not been described before. Autophagy is usually con- 
sidered to be an effector pathway, rather than a cause, of senescence, 
particularly in oncogene-induced senescence*”*?, 

How autophagy balances quiescence and senescence in muscle stem 
cells is unknown. Here we show that in adult resting muscle, quiescent 
stem cells attenuate proteotoxicity by maintaining a high basal auto- 
phagy flux, constituting a homeostatic ‘clean up’ process. This function 
is particularly critical in non-dividing stem cells, in which mitotic dilu- 
tion of intracellular toxic debris does not take place”®**, Autophagy 
failure in aged resting stem cells leads to accumulation of damaged 
proteins and dysfunctional organelles, specially mitochondria, which 
generates enhanced ROS levels that cause DNA damage and senes- 
cence entry, consistent with previous studies*>-*”. Indeed, we uncover 
ROS asa key epigenetic regulator of the senescence-promoting gene 
INK4a in ageing stem cells, by impeding PRC1-mediated lysine 119 
H2A ubiquitination, the required epigenetic mark for INK4a locus 
silencing. Consistent with this, treatment of geriatric mice (and mice 
with satellite-cell-specific Atg7 deficiency) with antioxidants not 
only restored PRC1-mediated INK4a locus repression and prevented 
satellite cell senescence, but also restored regenerative capacity. Signs 
of impaired autophagy and loss of proteostasis, correlating with senes- 
cence and defective myogenic functions, were also observed in human 
satellite cells from geriatric individuals. 

At variance with our findings, a recent study demonstrated that, 
upon in vitro stress, autophagy does not decline, but is even induced 
in haematopoietic stem cells with ageing, consistent with mainte- 
nance of haematopoietic stem cell number*’. Thus, we propose that 
long-lived quiescent stem cells within low turnover tissues primarily 
rely on autophagy to preserve fitness and avoid senescence, and that 
stem cells of skeletal muscle in particular lose this protection during 
ageing (Extended Data Fig. 7j). Notably, a recent study also reported 
that autophagy is needed for the activation of young satellite cells**. 
Furthermore, in the whole musculature, age-associated myofibre 
degeneration and mitochondrial dysfunction could also be alleviated 
by autophagy reactivation !?». 

Our studies thus demonstrate that autophagy is a decisive factor in 
the switch between the quiescence and senescence fate of muscle stem 
cells (Extended Data Fig. 7j). Although ageing-induced senescence is 
often viewed as an inescapable and irremediable process, we provide 
evidence that in vivo restoration of constitutive autophagy (or neu- 
tralization of excessive ROS) averts intracellular damage accumula- 
tion, and prevents satellite cell senescence and functional decline in 
old mice, as well as in aged human stem cells, reinforcing the notion 
that the intrinsic-ageing clock in stem cells can be pharmacologically 
manipulated. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The investigators were not blinded to allocation during experiments and outcome 
assessment. 

Mice. Male mice (C57BL/6 (wild-type, WT), LC3-GEP, the offspring of inter- 
crossing Atg7" with Pax7“* and Pax7©'*® lines) were used at different ages. 
GFP-LC3 mice were provided by G. Marifio. Mice with the Atg7 gene deletion in 
satellite cells, as an inducible or constitutive deletion , were generated by breeding 
Atg7"" mice (previously described in ref. 46) with the Pax7"* and Pax7 lines 
(provided by C. Keller and M. Capecchi, respectively). All animal experiments 
were approved by the ethics committee of the (Barcelona Biomedical Research 
Park (PRBB) and by the Catalan Government and used sex-, age- and weight- 
matched littermate animals. 

When needed, Cre activity was induced by intraperitoneal injection (one injec- 
tion per day for 4 days) with 5 mg per 25 g body weight of tamoxifen (Sigma; 
10mgml 1 in corn oil). 

Induction of muscle regeneration. Mice were anaesthetized with ketamine and 
xylazine (80:10 mgkg~!, intraperitoneally). Regeneration of skeletal muscle was 
induced by intramuscular injection of cardiotoxin (CTX, Latoxan; 10~> M) in 
the tibialis anterior muscle of the mice as described’”. At the indicated times after 
injury, mice were euthanized and muscles were dissected, frozen in isopentane 
cooled with liquid nitrogen, and stored at —80°C until analysis. For GFP immu- 
nostaining of samples, muscles were prefixed for 2h in 2% paraformaldehyde at 
4°C, and were embedded in 15% sucrose overnight at 4°C and then frozen in 
isopentane cooled with liquid nitrogen. 

Satellite cell isolation by FACS. Muscles were mechanically disaggregated and 
incubated in Ham’s F10 media containing 0.8% collagenase D (Roche) and 0.125% 
trypsin and EDTA at 37°C with agitation, for 25 min and the supernatant was 
then filtered. The digestion procedure was repeated four times and the super- 
natants were collected. Cells were incubated in lysis buffer (BD Pharm Lyse) for 
10 min on ice, re-suspended in PBS with 2.5% goat serum and counted. PE-Cy7- 
conjugated anti-CD31 (Biolegend 102418), anti-CD11b (Biolegend 101215/16) 
and anti-Sca-1 (Biolegend 108113/14) antibodies were used to exclude the Lin (—) 
negative population and Alexa647-conjugated anti-CD34 (BD Pharmigen 560230) 
and PE-conjugated anti-«7-integrin (Ablab AB1OSTMW215) were used for dou- 
ble-positive staining of quiescent satellite cells. Cells were sorted using a FACS 
Aria II (BD). Isolated satellite cells were used either for RNA extraction or were 
cultured in Ham’s F10 supplemented with 30% FBS and bFGF (0.025 }1g ml!) 
(growth medium) for proliferation assays or plated on glass slides (Thermo 
Scientific 177402) for immunostaining analysis. 

Flow cytometry analysis. FACS isolated satellite cells (see above) were stained 
with different dyes for flow cytometry analysis. Staining for mitochondria, 
lysosomes and ROS was performed by incubating cells at 37°C with 14.M 
tetramethylrhodamine, methyl ester TMRM (T-668), 100nM MitoTracker Green 
FM (M7514), 100nM MitoTracker Red CMXRos (M7512), 500nM LysoTracker 
Green DND-26 (L7526) and 51M CellROX Green reagent (C10444), following 
the manufacturer’s protocols (Invitrogen) and directly analysed without fixing. 
Cell analysis was performed in FACS LSR Fortesa (Becton Dickinson). For MFI 
determination, we used the flow cytometry analysis software Flowlogic. MFI 
refers to the fluorescence intensity of each event (on average) of the selected cell 
population, in the chosen fluorescence channel. 

Whole-transcriptome analysis of FACS-sorted satellite cells. FACS-sorted sat- 
ellite cells were collected in lysis buffer and RNA extraction was performed using 
RNeasy Micro kit (Qiagen). The cDNA was used for transcriptome analysis by 
Agilent SurePrint G3 Mouse GE 8 x 60K high density microarray slides, per- 
formed at the microarray Unit of CRG (Barcelona, Spain). Microarray analysis 
was performed with 3 animals each. Data was normalized using cyclic loess, and 
differentially expressed genes were identified using AFM 4.0 (ref. 48) for all pair- 
wise comparisons. Raw data was taken from the Feature Extraction output files 
and was corrected for background noise using the normexp method. To assure 
comparability across samples quantile normalization was used. Differential 
expression analysis was carried out on non-control probes with an empirical 
Bayes approach on linear models (limma). Results were corrected for multiple 
testing according to the false discovery rate (FDR) method. Statistical analy- 
sis was performed with the Bioconductor project (http://www.bioconductor. 
org/) in the R statistical environment. Venn diagrams were generated using 
BioVenn”?. 

In vivo treatments. Autophagy of aged C57BL/6 and GFP-LC3 mice was 
induced as follows, one group of mice was injected i.p. with 4mg per kg body 
weight of rapamycin (LC Laboratories) or vehicle (DMSO) every other day for 
2 weeks; a second group was injected i.p. with 30 mg per kg body weight of 
Trolox (6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic acid, Sigma) or 


ARTICLE 


vehicle (DMSO) daily for 2 weeks; and the third group of mice was treated with 
3mM spermidine ($2626 Sigma) in drinking water for 2 weeks. 

Satellite cell engraftment. Satellite cell transplants were performed as in ref. 8, 
following an adapted protocol®”. Quiescent FACS-isolated satellite cells were col- 
lected, re-suspended in 20% FBS Ham’s F10 medium and injected into muscles of 
recipient mice previously injured with cardiotoxin the day before. The recipient 
mice were SCID mice. For each mouse, 10,000 cells were injected. At 4 days (for 
proliferation, senescence analyses) or 1 month (muscle regeneration) after cell 
injections, engrafted muscles were collected and processed for muscle histology. 
Results are expressed as relative number of GFP* per muscle section, with respect 
to the control data for young cells, which was set at 100% . 

In vitro treatments. Experiments for in vitro rescue of defective autophagy 
in satellite cells were performed in 20% FBS containing Ham's F10 medium 
(growth medium), and with the addition of either rapamycin (100ngml~!, LC 
Laboratories), Trolox (100 1M , Sigma), spermidine (51M, Sigma) or vehicle 
(DMSO) for 48h. Mitochondrial, lysosomal, and ROS analyses or ChIP experi- 
ments were performed immediately after treatments, whereas proliferation assays 
(BrdU staining) and senescence analysis (SA-6-gal assay and determination of 
RNA and protein expression of senescence markers), were performed 96h after 
treatments. 

For the satellite cell treatments for in vivo engraftment in injured muscles, 
fresh FACS-isolated satellite cells from resting muscle of young and geriatric 
mice were treated for 48h with rapamycin (100ngml~!, LC Laboratories), 
Trolox (100M, Sigma) or vehicle (DMSO) before engraftment into pre- 
injured muscles of recipient mice. For each mouse, 10,000 cells were injected. 
At 4 days after cell injections, engrafted muscles were collected and processed 
for muscle histology. 

Bafilomycin (10nM Sigma B1793) was used to block autophagy for 4h at 
37°C and to analyse autophagosome accumulation by FACS, immunostaining 
and western blotting. CCCP (carbonyl cyanide 3-chlorophenylhydrazone, 
101M Sigma C2759), which abolishes the link between the respiratory chain and 
the phosphorylation system in intact mitochondria, causes mitochondria uncou- 
pling and was used to treat satellite cells in vitro for 1h to induce the selective 
autophagy of CCCP-damaged mitochondria (mitophagy). 

Plasmid transfection. Freshly isolated cells were transfected with mRFP-GFP- 
LC3 (ref. 23) plasmid using Lipofectamine 3000 (Invitrogene), and treated for 48h 
with Trolox (25 jlml~!, Sigma) or vehicle (DMSO) and analysed on glass slides 
(Thermo Scientific 177402). Cells were fixed with 4% paraformaldehyde in PBS 
for 10 min and the nuclei were stained with DAPI (Invitrogen). After washing, 
glass slides were mounted with Mowiol. Measuring autophagy flux through this 
method is based on the concept of lysosomal quenching of GFP. GFP is a stably 
folded protein and relatively resistant to lysosomal proteases. However, the low 
pH inside the lysosome quenches the fluorescent signal of GFP, which makes it 
difficult to trace the delivery of GFP-LC3 to lysosomes. In contrast, RFP exhibits 
more stable fluorescence in acidic compartments, and mRFP-LC3 can be readily 
detected in autolysosomes. By exploiting the difference in the nature of these two 
fluorescent proteins (that is, lysosomal quenching of GFP fluorescence versus 
lysosomal stability of RFP fluorescence), autophagic flux can be morphologically 
traced with an mRFP-GEP-LC3 tandem construct”’. With this tandem construct, 
autophagosomes and autolysosomes are labelled with yellow (mRFP and GFP) 
and red (mRFP only) signals, respectively. 

Proliferation assay. Satellite cells were labelled with BrdU (1.5 1g ml“; Sigma) 
for 1h. BrdU-labelled cells were detected by immunostaining using rat anti-BrdU 
antibody (Oxford Biotechnology; 1:500) and a specific secondary biotinylated 
goat anti-rat antibody (Jackson Inmunoresearch; 1:250). Antibody binding was 
visualized using Vectastain Elite ABC reagent (Vector Laboratories) and DAB. 
BrdU-positive cells were quantified as the percentage of the total number of cells 
analysed. 

SA-B-gal activity. SA-3-gal activity was detected in satellite cells using the senes- 
cence (-galactosidase staining kit (Cell signaling), according to the manufacturer's 
instructions. SA-B-gal* cells were quantified as percentage of the total number 
of cells analysed. 

Lentivirus infection. Freshly isolated satellite cells were ex vivo infected with 
distinct lentivirus for 12h. Medium was replaced and cells were transplanted 
into injured muscle of recipient mice for in vivo analysis, or subjected to in vitro 
assays. LV-Atg7, used for Atg7 overexpression in satellite cells, was provided 
by Eliezer Masliah’s laboratory*!. LV-sh-p16!\*“*, used to silence INK4a, and 
LV-sh-scramble (used as control), were previously described in ref. 8. 
Heterografting experiments. Extensor digitorum longus (EDL) muscles from 
geriatric wild-type mice were infected with lentivirus (LV-Atg7 or LV-GFP, as 
well as LV-sh-p16'N¥** or LV-sh-scramble) and grafted immediately onto the 
tibialis anterior muscle of young wild-type recipient mice, and regeneration 
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(formation of new myofibres derived from EDL-associated satellite cells) in the 
transplanted EDL muscles was analysed after 6 or 8 days. Fibre size of eMHCt 
myofibre was analysed using the Fiji program. 

RT-qPCR: RNA extraction, cDNA synthesis and PCR. Total RNA was iso- 
lated from either FACS-isolated satellite cells of mouse muscle tissue or human 
myoblasts obtained from human muscle biopsies, using Tripure reagent (Roche 
Diagnostic Corporation) or RNeasy Micro kit (Qiagen), and analysed by RT- 
qPCR. For qPCR experiments, DNase digestion of 10 mg of RNA was performed 
using 2 U DNase (Turbo DNA-free, Ambion). Complementary DNA (cDNA) was 
synthesized from total RNA using the First-Strand cDNA Synthesis kit (Amersham 
Biosciences). Real-time PCR reactions were performed on a LightCycler 480 
System using Light Cycler 480 SYBR Green I Master reaction mix (Roche 
Diagnostic Corporation) and specific primers. Thermocycling conditions were as 
follows: initial step of 10 min at 95°C, then 50 cycles of 15s denaturation at 94°C, 
10s annealing at 60°C and 15s extension at 72°C. Reactions were run in tripli- 
cate, and automatically detected threshold cycle values were compared between 
samples. Transcript of the ribosomal protein L7 housekeeping gene was used as 
endogenous control, with each unknown sample normalized to L7 content. The 
following primers were used, INK4a, forward: CATCTGGAGCAGCATGGAGTC, 
reverse: GGGTACGACCGAAAGAGTTCG; p21“, (also known as Cdkn 1a) 
forward: CCAGGCCAA GATGGTGTCTT, reverse: TGAGAAAGGATCA 
GCCATTGC; MyoD (also known as Myod1), forward: GCCGCCTGAGC 
AAAGTGAATG, reverse: CAGCGGTCCAGGTGCGTAGAAG; Mgn (Myog), 
forward, GGTGTGTAAGAGGAAGTCTGTG, reverse: TAGGCGCTCAAT 
GTACTGGAT; Ki67 (Mki67), forward, ACCGTGGAGTAGTTTATCTGGG, 
reverse, TGTTTCCAGTCCGCTT-ACTTCT; p15'%*# (also known as Cdkn2b), 
forward, TCTTGCATCTCCACCAGCTG, reverse, CTCCAGGTTTCCCA 
TTTAGC; Atg7, forward, TCTGGGAAGCCATAAAGTCAGG, reverse, 
GCGAAGGTCAGGAGCAGAA. 

Electron microscopy images. For electron microscopy images, tibialis anterior 
muscles from 3- and 24-month-old wild-type mice were fixed with 2% para- 
formaldehyde and 2.5% glutaraldehyde in phosphate buffer (0.1 M, pH 7.4). 
Samples were processed by the CCit Microscopy Facility at the University of 
Barcelona. Images were acquired using a Jeol 1010 microscope, working at 80kV 
and equipped with a CCD Megaview III camera. Identification of satellite cells in 
skeletal muscle by electron microscopy was based on cell size, content of hetero- 
chromatin and position with respect to basal lamina. 

Western blotting. Preparation of mouse and human satellite cell lysates and 
western blotting was performed as described previously in ref. 52. Antibodies 
used were: anti-p62/SQSTM1 antibody produced in rabbit (Sigma P0067), 
rabbit anti-LC3 (Novus Biologicals NB100-2331), phospho-S6 ribosomal 
protein (Ser240/244) XP rabbit monoclonal antibody (Cell Signaling 5364), 
rabbit anti-p16 (Santa Cruz Biotechnology sc-1207), rabbit anti-parkin (Abcam 
ab15954), S6 ribosomal protein (54D2) mouse (Cell Signaling 2317), H2AX 
Ser 139 (Cell Signaling 2577S), rabbit anti-53BP1 (Abcam ab21083) and Tubulin 
(Sigma T-6199). 

ChIP. Briefly, freshly isolated satellite cells were cultured with Trolox or vehi- 
cle (DMSO) for 48h and crosslinked with 1% formaldehyde for 15 min at 
room temperature. For each ChIP, 300,000 cells were lysed in 13011 of lysis 
buffer B (Low Cell ChIP Kit, Diagenode) and chromatin was sonicated for 
10 min in a M220 Focused-ultrasonicator, Covaris (Duty cycle 5%, Peak inci- 
dent power 75 W and 200 cycles per burst). Sonicated chromatin was then 
diluted and subjected to immunoprecipitation with 3 tl of antibody against 
ubquitinated histone (Ubiquityl-Histone) H2A (Lys119) (D27C4) (Cell 
Signaling, 8240) or 31 of IgG. Bound fraction and input were analysed by 
qPCR using specific primer sets for the INK4a locus. INK4a_RD forward, 
GGTCTCCCCTAGCAGGATTEG, reverse GCCTGTCATTAAACAGGGTGA; 
INK4a_exon1 forward, CCGGAGCCACCCATTAAACTA, reverse CAAGACTT 
CTCAAAAATAAGACACTGAAA; INK4a_exon2 forward, CCCAACACCC 
ACTTGAGGAA, reverse, CAGAGGTCACAGGCATCGAA. 

Histology and immunohistochemistry in muscle cryosections. Tibialis 
anterior and extensor digitorum longus (EDL) muscles were frozen in iso- 
pentane cooled with liquid nitrogen, and stored at —80°C until analysis. Then 
10-j1m sections were collected from muscles and were either stained with 
haematoxylin and eosin or immunostained. Labelling of cryosections with 
mouse monoclonal primary antibodies was performed using the peroxidase or 
fluorescein M.O.M. kit staining (Vector Laboratories) according to the manu- 
facturer’s instructions. Double immunostaining was performed by sequential 
addition of each primary and secondary antibody using appropriate positive 
and negative controls. Sections were air dried, fixed on 2-4% paraformalde- 
hyde, washed on PBS and incubated with primary antibodies according to 
manufacturer's instructions after blocking for 1h at room temperature with 


a high-protein-containing solution in PBS (Vector Laboratories). The slides 
were then washed with PBS and incubated with the appropriate secondary 
antibodies and labelling dyes. For immunofluorescence, secondary antibod- 
ies were coupled to Alexa-488, Alexa-568 or Alexa-647 fluorochromes, and 
nuclei were stained with DAPI (Invitrogen). After washing, tissue sections 
were mounted with Mowiol. 

Antibodies used for immunohistochemistry. Immunohistochemistry 
on muscle cryosections or isolated satellite cells was performed with the 
following antibodies: GFP (Invitrogen A6455 and Aves labs GFP-1020), anti- 
eMHC (F1.652), anti-Pax7 (DSHB), p16 (Santa Cruz sc-1207), YH2AX Ser139 
(2577S), rabbit polyclonal anti-MyoD (Santa Cruz Biotechnology sc-760), 
anti-myogenin (DSHB F5D), poly-ubiquitinylated proteins, multi-ubiquitin 
chains, mouse monoclonal antibody (Enzo life sciences PW8805), anti-p62/ 
SQSTM1 antibody produced in rabbit (Sigma P0067), mouse monoclonal anti- 
body to LC3 (NanoTools 5F10), LAMP-1 (Santa Cruz Biotechnology sc-19992), 
phospho-S6 ribosomal protein (Ser240/244) XP rabbit monoclonal antibody 
(Cell Signaling 5364), anti-CD56 (BD Pharmingen 556325), anti-TOM20 
(ab56783). 

Human muscle samples. Muscle biopsies from 8 adults and 10 geriatric (28 +7 
and 83 +7 years old, respectively) human subjects were obtained via the Tissue 
Banks for Research from Vall d’Hebron and Sant Joan de Deu Hospitals and 
especially via the EU/FP7 Myoage Consortium. Muscle biopsies were taken from 
the vastus lateralis muscle under local anaesthesia (2% lidocaine). A portion of 
the muscle tissue was directly frozen in melting isopentane and stored at —80°C 
until analysis. Human primary myoblasts from 5 young/adult (25 + 4 years old) 
and 5 geriatric (75 + 4 years old) subjects were obtained from the EU/FP7 Myoage 
Consortium or purchased from Cook Myosite and cultured following the pro- 
vided instructions. 

Digital image acquisition and processing. Digital images were acquired 
using: (1) an upright microscope DMR6000B (Leica) equipped with 
a DFC300FX camera for immunohistochemical colour pictures and a 
Hamamatsu ORCA-ER camera for immunofluorescence pictures; (2) confocal 
images of muscle sections or isolated satellite cells were taken using either 
a Zeiss LSM-780 confocal system with a Plan-Apochromat 63 x /1.4 NA oil 
objective or a Leica SPE confocal laser scanning microscope system with HCX 
PL Fluotar 10 x /0.30 NA, 20 x /0.50 NA and 40 x /0.75 NA objectives. The 
different fluorophores (3 to 4) were excited using the 405, 488, 568 and 633 
nmexcitation lines. Acquisition was performed using Zeiss LSM software 
Zen Black or Leica Application or LAS AF software (Leica). Images were 
composed and edited in Photoshop CS5 (Adobe), in which background was 
reduced using brightness and contrast adjustments applied to the whole image. 
To assess myofibre size, individual fibres were manually outlined and their 
cross-sectional area (CSA) was determined with the public domain image 
analysis software Fiji. Fluorescence intensity of selected proteins for each cell 
was quantified using Fiji software and the average of relative fluorescence 
was expressed as MFI. 

The number and percentage of cellular area occupied by GFP-LC3 puncta 

were determined on digital images with Fiji and the cell image analysis software 
CellProfiler**. Co-localization of RFP-LC3 and GFP-LC3 puncta was determined 
on the maximum projection of three z-sections using a Fiji automated macro pipe- 
line calculating single and double-positive autophagosomes. Co-localization of 
p62 and ubiquitin was determined on digital images Fiji, according to ref. 54, with 
respect to the total cellular area. The Pearson’s coefficient (r) was used to analyse 
the correlation of the intensity values of green and red pixels in dual-channel 
images. This coefficient measures the strength of the linear relationship between 
the intensities in two images calculated by linear regression and ranges from 
1 to x1, with 1 standing for complete positive correlation and x1 for a negative 
correlation, with zero standing for no correlation*!. Video reconstructions of 
autophagosomes were generated in Imaris software using full confocal z-stacks 
(around 20) of each cell. The z-stacks were previously imported to Fiji software 
for background adjustments and then deconvolved using the blind-deconvolution 
wizard in Huygens software. 
Statistical analysis. For mouse experiments, no specific blinding method was 
used, but mice in each sample group were selected randomly. The sample size (n) of 
each experimental group is described in each corresponding figure legend, and all 
experiments were repeated at least with three biological replicates. GraphPad Prism 
software was used for all statistical analyses. Quantitative data displayed as histo- 
grams are expressed as means + standard error of the mean (represented as error 
bars). Results from each group were averaged and used to calculate descriptive 
statistics. Mann-Whitney U-test (independent samples, two-sided) was used for 
pairwise comparisons among groups at each time point. Statistical significance 
was set ata P<0.05. 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | The reduced autophagy flux in quiescent 
satellite cells can be increased by pharmacological treatment in vivo. 

a, Venn diagrams of overlapping genes between a proteostasis gene set 
(See Supplementary Table 1) and genes significantly upregulated in 
quiescent satellite cells from the indicated publications or from our gene 
expression microarray data comparing freshly FACS isolated satellite 

cells from resting muscle, or muscles obtained 72h after cardiotoxin 
(CTX) injury, from young, wild-type mice. b, K-means clustering analysis 
(performed with Gene-E, Broad Institute) of the gene expression of the 
autophagy-related genes during ageing. Clusters are shown with heat maps 
of the normalized raw data. Each column represents a different sample and 
each row a different gene probe. Red, increased expression; white, neutral 
expression; blue, decreased expression. c, Representative example of the 
FACS strategy and gating scheme for isolating satellite cells from mice in 
resting conditions. d, Pax7 and GFP immunostaining of freshly isolated 
satellite cells from resting muscles of young and old GFP-LC3 mice. 

Scale bar, 5j1m. e, Electron microscopy images of young and old satellite 
cells on sections of resting tibialis anterior (TA) muscle of wild-type 

(WT) mice. Arrowheads indicate autophagic vesicles. Scale bars, 
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1jm and 0.5 1m (right and left, respectively). f, Pax7 and GFP 
immunostaining on tissue sections from resting tibialis anterior muscles 
of young and old GFP-LC3 mice. Arrowheads indicate autophagic 
vesicles. Scale bar, 541m. g, p62 and ubiquitin (Ub) MFI. Arrowheads, 
co-localization of p62 and ubiquitin aggregates. h, LC3 western blot of 
freshly isolated satellite cells from young and old, wild-type mice, treated 
with bafilomycin or vehicle for 4h before collection. Graph shows LC3I1 
quantification, after normalization with tubulin levels; for full scan see 
Supplementary Fig. 2. i, Quiescent satellite cells were freshly isolated from 
old, wild-type mice subjected to two weeks of rapamycin, spermidine 

or vehicle (control) treatment. Cells were treated (or not treated) with 
bafilomycin 4h prior to analysis by immunostaining of LC3 marker. 

Z projections of representative fluorescence microscopy images are shown. 
Scale bars, 541m. j, Representative fluorescent microscopy images from 
Fig. 1d. Scale bar, 5\1m. Data show mean + s.e.m. Comparisons by 
two-sided Mann-Whitney U-test. P values are indicated. Number of 
samples were n = 3 animals per group for a and b; n= 35 (young) and 66 
(old) cells analysed from 3 animals for g; n = 3 animals per group for h. 
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Extended Data Figure 2 | Reinduction of autophagy rescues 
proliferation and reduces senescence in geriatric satellite, thus 
restoring regenerative capacity. a, Transplanted muscles from Fig. 2c 
were immunostained for GFP and for Ki67, Pax7, MyoD or Mgn (to 
determine the distinct possible myogenic states of satellite cells in the 
regenerating muscle). Scale bars, 501m. b, Autophagy flux analysed by 
flow cytometry in freshly isolated satellite cells from resting muscle of 
GFP-LC3 mice, treated for 48 h with rapamycin or vehicle (control). 
Satellite cells were treated with bafilomycin or vehicle for 4h before 
analysis. Results are expressed as the change in GFP-LC3 MFI in 
bafilomycin (—) compared to bafilomycin (+) conditions. c, Western 

blot analysis of pS6 protein levels in geriatric satellite cells from wild-type 
mice, treated for 48 h with rapamycin or vehicle (control). Graph shows 
pS6 quantification, normalized to tubulin; for full scan see Supplementary 
Information Fig. 2. d, As in Fig. 2c, percentage ~H2AX* or p16'N**** cells 
from total GFP* cells were quantified. Scale bars, 10\m. e, Quantification 
BrdU* and senescence-associated 3-gal* satellite cells, pre-treated as in 
Fig. 2c and analysed after 96h. f, Quantification of senescent (senescence- 
associated (-gal*) satellite cells, isolated from young and geriatric wild- 
type mice, pre-treated for 48 h with spermidine or vehicle (control) and 
cultured for 96h. g, Quantitative real-time PCR (RT-qPCR) analysis of 
Atg7 expression on satellite cells infected with LV-Atg7 or LV-control 
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(LV-Co), and cultured for 96 h. h, GFP-LC3 satellite cells were infected 
with LV-Atg7 or LV-Co and treated with bafilomycin or vehicle for 4h 
before analysis. Autophagy flux was analysed by flow cytometry and 
represented as in b. Representative images are shown. Scale bar, 10,1m. 

i, Muscle regeneration experiment by satellite cell transplantation. An 
equal number of satellite cells from young and geriatric mice infected 
with a lentivirus overexpressing the Atg7 gene (LV-Atg7) or a lentivirus 
control (LV-Co), which also expressed GFP, were transplanted into injured 
muscle of young immunodeficient mice, and collected 28 days later. GFP 
expression in muscles was analysed by immunostaining. Quantification 
of GFP* cells (fibres) per muscle field versus transplanted control-treated 
satellite cells. Representative images are shown. Scale bar, 75m. 

j, EDL geriatric muscles, infected with LV-Atg7 or LV-Co, and grafted on 
recipient mouse muscle. Regeneration was analysed 8 days later. Frequency 
distribution of regenerating fibres by size. Scale bar, 25 ,1m. Data are 

mean + s.e.m. Comparisons by two-sided Mann-Whitney U-test. P values 
are indicated. Number of samples were n = 60,000 cells analysed from 

3 animals for b; n = 3 animals per group for c; n =5 engraftments 

per group for d; n = 3 animals per group for e-g; n = 60,000 cells 

analysed from 3 animals for h; n= 3 engraftments per group for i; 

n=4 engraftments per group for j. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Control Rapamycin 
a Atg7“™ Atg74Pex7 b 5 + Baf + Baf c 
a I WT 
5  Atg7 bee = 
5 g LC3I [A tg74Pe 2 S 
$15 Losi: =z B 200000) px0.009 — Atg7r 
| pss Atg7"" go 26" © 50000 DAtg7~" 
> 40 [BAtg7“r~’ & 204 3 
g p62 = 8 100000: 
3 £ 15) s + Baf + Baf ea 
© 08 = 10 é > 50000 
< E a Q 
E oo Tubulin g 5 > = oo 
E 0. bd 
e f : 14h post-injury 
MyoD Mgn Ki67 Atg7" 
5 ead ‘APaxT 
3 - F 31 Atg7™" xt BB Atg7** 
g —— N.S. N.S. Atg74Pex7eR p<0.04 
Siofl B solte “7 7 2 60 
8 salt rn 8 >| 
_ = F a 
é 2 0 0 cs oi 
S 0: NR = 
> z 2 © 20 
z & 0.0 0.0 0.0 oa & 
= 6 
h 15 days post-injury i 
= § Atg7™" 
KR WT ‘APax7ER 
2 > i pe Atg7irer 
<x <x 8 50, p<0.03 
4 p<0.001 2 40] + 
T 8 30 
Q i z 6 5 p<0.03 
i x 3 
KR a 3 4 I A & 10 [] 
i) >) Ea & 0 = 
=x Bd 1 
<x L- 7 15 
— days post-injury 
j Atg7 
J Atg7aPorek k > Atg7*rex Atg7"" 
o p<0.0001 3 DIAtg74Pex" 
=) t 200 = 
2 = 2500, 
2 = 1500} p<oot 2 = 0.0001 
74 g g E 2000) 
g 5 100 ® = 1500] 
a g > fi i 
g ir 500 a @ 1000} 
5 N Fa 500 
Ik 7 28 + o} 
days post-injury N 
p Atg7"":GFP-LC3 
1 pas m 7 Cell Transplantation NS n 
ie a 4d @ 7 Oe Atg7 
> 7 —p Analysis NS oO Atg74Pax7 
= Atg7"" a Atg7" Control 2 15 1 g 
3 10 Atg7éPav7er SCID mice = 150 ass DAtg7“"7Control 2 
oO a eS 
io Atg74P*": GFP-LC3 5 i Atg7™ Rapamycin: = 10 
S 2 100 WB Atg7“"2” Rapamycin Ks 
gg 5 2 0) 
x x 
% a 50 0 ual 
9 a 6 4 days post-transplantation 
ae Atg7“" Control 2 0 ee 7 
7 days post-injury [Atg7*"="Control 
Atg7"" Rapamycin Control Rapamycin 
fe} fm Atg7*°’ Rapamycin 
BB Atg7"" Spermidine 
i Atg7“"= Spermidine = 
p<0.05 2 
s <0.05 <x 
p<0.05 = pa aee 
60) [1 = 10 
& 28 % 
4 ry o 
8 =o a 
2 20 a * 2 
2 ; ae 
a é 2 a 
0. 0 


Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Genetic impairment of autophagy in young 
quiescent satellite cells leads to premature senescence and impaired 
muscle regeneration. a, RT-qPCR analysis of Atg7 expression and 
western blot analysis of LC3, p62 and tubulin of satellite cells isolated 
from Atg7“" and Atg7“"**” mice. Graph shows the quantification 

of p62 normalized to tubulin; for full scan see Supplementary Fig. 2. 

b, Quiescent satellite cells were freshly isolated from Atg7“' and 
Atg7“?**7 mice which had been subjected to two weeks of rapamycin 

or vehicle (control) treatment in vivo. Cells were treated (or controls 
were untreated) with bafilomycin 4h before analysis by fluorescence 
microscopy. Z projections of representative fluorescence microscopy 
images are shown. Scale bar, 5|1m. c, Quantification of satellite cells in 
resting muscle of three-month-old Atg7“? and Atg7“?**’ mice by flow 
cytometry analysis («7 integrin* CD34" cells per gram of muscle tissue). 
d, Representative fluorescent microscopy images from Fig. 3d. Scale 

bar, 10j1m. e, RT-qPCR analysis of MyoD, Mgn and Ki67 expression in 
freshly isolated quiescent satellite cells from resting muscle of Atg7“7 
and Atg7*?*7ER mice, 7 days after tamoxifen treatment. f, Percentage of 
activated satellite cells (Pax7*/MyoD*) from the total Pax7* cells (FACS- 
isolated 14-h post-injury from (a)). Scale bar, 501m. g, pS6 and Lamp1 
immunostaining of cells from a. Scale bar, 101m. h, yYH2AX protein levels 
per nucleus in Pax7* satellite cells in tibialis anterior muscles of Atg7“7 
and Atg7*?7ER mice, 15 days post-injury. Representative images are 
shown. Scale bar, 251m. i, Pax7* satellite cells were quantified following 
immunostaining on regenerating muscles of Atg7“? and Atg74?**7E8 
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mice 7 days and 15 days after cardiotoxin injury. j, Representative images 
of haematoxylin and eosin staining of muscles at 7 days post-injury on 
muscles of Atg7“! and Atg7?*’EX mice. Fibre size of central-nucleated 
myofibres at 7 days and 28 days post-injury is quantified. Scale bar, 501m. 
k, Tibialis anterior muscles of Atg7“7 and Atg7“"**7 mice were injured 
by cardiotoxin injection and 21 days later these muscles were reinjured 
and then subsequently analysed 21 days later (21 + 21 days post-injury). 
The size of central-nucleated myofibres was quantified. Representative 
images are shown. Scale bar, 50m. 1, Pax7* and Ki67* double-positive 
satellite cells were quantified following immunostaining on regenerating 
muscles of Atg7“? and Atg7?*’E® mice 7 days after cardiotoxin injury. 
m, An equal number of quiescent satellite cells from Atg7!:GFP-LC3 
and Atg7“?**7:GFP-LC3 mice (two weeks + rapamycin pre-treatment), 
transplanted as in Fig. 2c, and immunostained with the indicated 
antibodies 4 days later. Quantification of GFP* cells per muscle field. 
Values relative to transplanted young cells (100%). Representative images 
are shown. Scale bar, 751m. n, Percentage of GFP* cells that are also 
Ki67* cells in muscles from m. 0, Quantification of proliferating (BrdU*) 
and senescent (SA-3-gal*) satellite cells, isolated from Atg7“7 and 
Atg7“?’, pre-treated for 48h with spermidine or rapamycin (or control 
vehicle) and cultured for 96h. Data show mean + s.e.m. Comparisons by 
two-sided Mann-Whitney U-test. P values are indicated. The number of 
samples were n= 3 animals per group (a); n= 7 animals per group (c); 
n=3 animals per group (e-I); n= 4 engraftments per group (m, n); n=3 
animals per group (0). 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Autophagy loss in satellite cells causes 
dysfunctional mitophagy and mitochondria accumulation, 

leading to increased ROS and senescence. a, p62 and ubiquitin 
immunostaining on freshly isolated satellite cells from resting muscle 
of three-month-old Atg7“" and Atg74?**7ER mice, one month after 
tamoxifen treatment. Arrowheads indicate co-localization of p62 and 
Ub aggregates. Representative images are shown. Scale bar, 5 j1m. 

b, TOM20 and Lamp1 immunostaining of quiescent satellite cells 
isolated from young and geriatric WT mice. Mice were subjected to two 
weeks of rapamycin, spermidine or Trolox (or vehicle) treatment before 
analysis. Co-localization was calculated as the area occupied by the 
immunofluorescence co-localizing staining on images with respect to 
the total cellular area. The Pearson's coefficient (r) was used to analyse 
the correlation of the intensity values of green and red pixels in the 
dual-channel images. The z projections of representative fluorescence 
microscopy images are shown. Scale bar, 51m. c, Mitochondria 
quantification by MitoTracker in quiescent satellite cells of old mice, 
treated with rapamycin or vehicle for two weeks. d, Mitochondria 
(MitoTracker labelling) in young or geriatric cells. Satellite cells, were 
pre-treated with CCCP for 1h (see Methods) and + rapamycin for 
24h. Percentage of MitoTracker MFI reduction + rapamycin. e, For the 
mitochondrial membrane potential analysis, satellite cells were freshly 
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isolated from young wild-type mice and treated for 1 h with CCCP or 
DMSO (control). Membrane potential (TMRM MFI/MitoTracker 

Green MFI ratio) of cells was calculated by flow cytometry analysis at 

lh and 24h after CCCP treatment (being 100% the membrane potential 
value of control satellite cells). f, Mitochondria content was quantified 

by MitoTracker staining of satellite cells from young and geriatric 
wild-type mice treated with rapamycin or vehicle (control) for 48h. The 

z projections of representative fluorescence microscopy images are shown. 
Scale bar, 541m. g, Mitochondria and ROS detection by MitoTracker and 
CellROX staining, respectively. Co-localization was calculated as in b. 
The z projections of representative fluorescence microscopy images are 
shown. Scale bar, 51m. h, Representative images of freshly isolated satellite 
cells from resting muscle of three-month-old Atg77 and Atg7“?**” mice 
stained with CellROX fluorescent dye and p16'N** antibody. Scale bar, 
5\um. Data are mean + s.e.m. Comparisons by two-sided Mann-Whitney 
U-test. P-values are indicated. Number of samples were n = 36 (Atg7") 
and n= 38 (Atg7?@x7ER) cells analysed from 3 animals (a); n = 23 (young), 
n= 24 (control), n= 42 (rapamycin); n = 28 (spermidine) and n= 21 
(Trolox) cells analysed from 3 animals (b); 1 = 60,000 cells analysed from 
3 animals (c); n = 40,000 cells analysed from 4 animals (d); n = 30,000 
cells analysed from 3 animals (e, f); n = 18 (young), 21 (control), 15 
(rapamycin) and 13 (Trolox) cells analysed from 3 animals (g). 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | ROS inhibition in autophagy-impaired aged 
and Atg7 null satellite cells significantly restores cell proteostasis. 

a, ROS-level quantification in quiescent satellite cells from three-month-old 
Atg77 and Atg74"**” mice by CellROX flow cytometry. Representative 
images are shown. Scale bar, 51m. b, Western blot analysis of 53BP1 

and parkin in satellite cells isolated from three-month-old Atg7“" and 
Atg7“""7 mice. Tubulin control is the same tubulin control for Fig. 3g. 
Graph shows quantification of 53BP1 and parkin protein normalized 

to tubulin; for full scan see Supplementary Information Fig. 1. 

c, Quantification of ROS levels for satellite cells isolated from young and 
geriatric WT mice by flow cytometry using CellROX fluorescent dye. 
Satellite cells were treated with Trolox or vehicle (control) for 48 h before 
analysis. Results are represented as variation of MFI between young and 
geriatric satellite cells. d, Quantification of p62 and ubiquitin protein 
levels on immunostained freshly isolated satellite cells from resting 
muscle of old wild-type mice, in vivo treated for 2 weeks with Trolox or 
vehicle (control). Representative images are shown. Scale bar, 5 um. 

e, Western blot analysis of LC3 and tubulin in satellite cells isolated from 
geriatric WT mice and treated for 48 h with Trolox or vehicle (control), 
in the absence or presence of bafilomycin for 4h before analysis. Graph 
shows quantification of LC3H] protein normalized to tubulin; for full 

gel scan see Supplementary Information Fig. 2. f, Autophagy flux and 
mitochondria in satellite cells from GFP-LC3 mice (two weeks with or 
without Trolox treatment). Satellite cells treated for 4h + bafilomycin 
treatment. Representative images are shown. Scale bar, 51m. g, The 
mRFP-GFP-LC3 plasmid was transfected into young or geriatric satellite 
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cells, with 48 h treatment + Trolox and then 4h treatment + bafilomycin, 
prior to fixation. The percentage of autophagosomes was quantified as in 
Fig. 2a. h, Muscle regeneration using geriatric satellite cell transplantation. 
An equal number of freshly isolated geriatric satellite cells, infected 

with GFP lentivirus and treated for 48 h with Trolox or vehicle, were 
transplanted into injured muscle of young immunodeficient mice. Four 
days later, muscles were collected and immunostained for GFP, MyoD 

and Mgn (to determine the possible myogenic states of satellite cells in 

the regenerating muscle). Representative images are shown. Scale bar, 
50m. i, ChIP analysis for H2AK119ub (H2Aub) in satellite cells isolated 
from young and geriatric wild-type mice. j, Quantification of proliferating 
(BrdU*) and senescent (SA-6-gal*) satellite cells isolated from Atg7“? 
and Atg7APa7 mice treated 48 h with Trolox or vehicle (control) and 
cultured for 96h. k, Quantification of proliferating (BrdU*) and senescent 
(senescence-associated 3-gal*) satellite cells isolated from Atg7™? and 
Atg7“?"7 mice and infected with LV-sh p16'N* or LV-sh scramble, and 
cultured for 96h. Data show mean + s.e.m. Comparisons by two-sided 
Mann-Whitney U-tests. P values are indicated. Number of samples were 
n= 60,000 cells analysed from 3 animals (a); » = 3 animals per group 

(b); n= 60,000 cells analysed from 3 animals (c); n = 36 (control) and 

n= 35 (Trolox) cells analysed from 3 animals (d); n =3 animals per 

group (e); nm = 60,000 cells analysed from 3 animals (f); nm =21 (young), 
n= 20 (young, Trolox), n= 19 (young, + bafilomycin), n= 18 (young, 
Trolox + bafilomycin), n = 21 (geriatric), n= 19 (geriatric, Trolox), n= 15 
(geriatric, + bafilomycin) and n = 37 (geriatric, Trolox + bafilomycin) cells 
analysed from 3 animals (g); n = 3 animals per group (i-k). 
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Extended Data Figure 6 | Effects of p16'N* silencing in autophagy- 
impaired young murine satellite cells. a, Western blotting quantification 
of Atg7“”**’ satellite cells, infected with lentiviral LV-sh-p16'N** or 
LV-sh-scramble and analysed 96h later; for full gel scan see Supplementary 
Fig. 2. b, Atg7“? and Atg7“?**” EDL, infected with LV-sh-p16'N** or 
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LV-sh-scramble, and grafted as Extended Data Fig. 2). Representative 
eMHC-immunostaining. Scale bar, 25 j1m. Data show mean + s.e.m. 
Comparisons by two-sided Mann-Whitney U-test. P values are indicated. 
The number of samples were n = 3 animals per group (a) andn=4 
engraftments per group (b). 
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Extended Data Figure 7 | Impaired autophagic flux in human geriatric 
satellite cells. a, Representative images of haematoxylin and eosin staining 
of human muscle biopsies from young (25 years old) and geriatric 

(95 years old) donors in resting conditions. Arrowheads indicate atrophic 
myofibres. Scale bar, 501m. b, CD56 and pl6NK4a immunostaining on 
human muscle sections of samples described in a. Scale bar, 10,1m. 

c, Western blotting analysis of p62 protein in human satellite cells from 
young (about 25 years old) and geriatric (over 75 years old) donors, 
treated for 48 h + rapamycin; for full gel scan see Supplementary Fig. 2. 

d, ROS and mitochondrial content analysis in human cells from treated 
for 48h + rapamycin. Graphs show MFI variation. Scale bar, 5 zm. 

e, Representative images from CellROX staining from d. Scale bar, 51m. 
f, Quantification of SA-6-gal* human cells treated for 48 h + rapamycin. 
Quantification was carried out 96h after treatment. Scale bar, 200 1m. 


g, Quantification of proliferating (BrdU*) young and geriatric human 
satellite cells in culture. Representative pictures are shown. Scale bar, 

25 um. h, Western blot analysis of pS6, total S6 and tubulin in young and 
geriatric human satellite cells treated for 48 h with rapamycin or vehicle 
(control). Graphs show p62 quantification normalized to tubulin; for full 
scan see Supplementary Information Fig. 2. i, Immunostaining of pS6 in 
young and geriatric human satellite cells treated as in h. Scale bar, 75 jum. 
j, Scheme showing the proposed model of how age-impaired autophagy 
leads to muscle stem-cell senescence and regenerative decline. Data show 
mean = s.e.m. Comparisons by two-sided Mann-Whitney U-tests. 

P values are indicated. The number of samples were n = 3 human donors 
per group (a—c); 1 = 60,000 cells analysed from 3 human donors (d), n=3 
human donors per group (f-h). 
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Substantial contribution of extrinsic risk 
factors to cancer development 


Song Wu!?, Scott Powers!?7, Wei Zhu!? & Yusuf A. Hannun?*4° 


Recent research has highlighted a strong correlation between tissue-specific cancer risk and the lifetime number of 
tissue-specific stem-cell divisions. Whether such correlation implies a high unavoidable intrinsic cancer risk has become 
a key public health debate with the dissemination of the ‘bad luck’ hypothesis. Here we provide evidence that intrinsic 
risk factors contribute only modestly (less than ~10-30% of lifetime risk) to cancer development. First, we demonstrate 
that the correlation between stem-cell division and cancer risk does not distinguish between the effects of intrinsic 
and extrinsic factors. We then show that intrinsic risk is better estimated by the lower bound risk controlling for total 
stem-cell divisions. Finally, we show that the rates of endogenous mutation accumulation by intrinsic processes are not 
sufficient to account for the observed cancer risks. Collectively, we conclude that cancer risk is heavily influenced by 
extrinsic factors. These results are important for strategizing cancer prevention, research and public health. 


Cancers were once thought to originate from mature tissue cells that 
underwent dedifferentiation in response to cancer progression’. Today, 
cancers are proposed to originate from the malignant transformation of 
normal tissue progenitor and stem cells”, although this is not wholly 
accepted*, Nevertheless, recent research has highlighted a strong cor- 
relation of 0.81 between tissue-specific cancer risk and the lifetime 
population size in cumulative number of cell divisions of tissue- 
specific stem cells°. However, there has been controversy regarding the 
conclusion that this correlation implies a very high unavoidable risk for 
many cancers that is due solely to the intrinsic baseline population size 
of tissue-specific stem cells”. Many arguments against the ‘bad luck’ 
hypothesis have been made*’, yet none of these have offered specific 
alternatives to quantitatively evaluate the contribution of extrinsic risk 
factors in cancer development. Applying several distinct modelling 
approaches, here we provide strong evidence that unavoidable intrin- 
sic risk factors contribute only modestly (less than ~10-30%) to the 
development of many common cancers. 

We made the conservative and yet conventional assumption that 
errors occurring during the division of cells, being routes of malignant 
transformation, can be influenced by both intrinsic processes as well as 
extrinsic factors (Fig. 1). Intrinsic processes’ include those that result in 
mutations due to random errors in DNA replication, whereas ‘extrinsic 
factors’ are environmental factors that affect mutagenesis rates (such 
as ultraviolet (UV) radiation, ionizing radiation and carcinogens). For 
example, radiation can cause DNA damage, which would primarily 
result in deleterious mutations with functional consequences on cancer 
development only after cell division. Therefore, extrinsic factors may 
act through the accumulation of genetic alterations during cell division 
to increase cancer risk. Accordingly, cancer risk would result from those 
apparently uncontrollable intrinsic processes (Fig. 1, arrow 1) as well 
as from those highly modifiable and thus preventable extrinsic factors 
(Fig. 1, arrow 2). 


Correlation cannot differentiate risks 

According to the above hypothesis, both intrinsic and extrinsic factors 
can impart cancer risk through the accumulation of these errors, espe- 
cially the ‘driver mutations’ (Fig. 1, arrow 3). As such, a correlational 


analysis between cancer risk and cell division, for either stem or non- 
stem cells, is unable to differentiate between the contributions of 
intrinsic and extrinsic factors. This is best illustrated through a thought 
experiment where we consider a hypothetical scenario of a sudden 
global emergence ofa very potent mutagen, such as a strong radiation 
burst from a nuclear fallout, which quadruples the lifetime risks for all 
cancers. In this scenario, it transpires that the proportion of cancer risk 
caused by intrinsic random errors would be small (at most one-quarter 
if we assume all of the original risk was due to intrinsic processes). 
However, if we conduct regression analyses on either the new hypo- 
thetical cancer risks or the current cancer risks as reported, against 
the number of stem-cell divisions®, the correlations from both cases 
would be 0.81 (Fig. 2). This thought experiment negates the ability of 
the correlation to detect solely the contribution of intrinsic factors as 
it cannot distinguish between intrinsic and extrinsic factors. Thus, it 
argues against the implication that around two-thirds of variation could 
be explained by division-related random intrinsic errors. 


Lower bound intrinsic risk line 

The above conclusion then raises the question of what proportion of 
total cancer risk is due to extrinsic versus intrinsic factors. In a data- 
driven approach, we first re-examined the quantitative relationship 
between the observed lifetime cancer risk and the divisions of the 


1 2 
—~> | Stem-cell division | <—— Extrinsic factors 
i 


Figure 1 | Schematic showing how intrinsic processes and extrinsic 
factors relate to cancer risks through stem-cell division. This hypothesis 
maintains the strong role of stem-cell division in imparting cancer risk, but 
it also illustrates the potential contributions of both intrinsic and extrinsic 
factors operating through stem-cell division. Other effects, for example, 
through division of non-stem cells, are considered later in this analysis. 
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Figure 2 | Correlation analysis of stem-cell division and cancer risk 
does not distinguish contribution of extrinsic versus intrinsic factors 
to cancer risk. The black dots are data from figure 1(also shown in 
supplementary table 1) of Tomasetti & Vogelstein’, and the black line 
shows their original regression line. The blue diamonds represent the 
hypothesized quadrupled cancer risks due to hypothetical exposure to 

an extrinsic factor such as radiation. The blue regression line for the 
hypothetical risk data maintains the same correlation as the original black 
line, albeit reflecting a much higher contribution of extrinsic factors to 
cancer risk. 


normal tissue stem cells as reported’, with a distinct alternative method. 
Our rationale was that intrinsic risk, or indeed its upper bound, can 
be better estimated by the lowest boundary on the plots of cancer risk 
versus total tissue stem-cell divisions (Fig. 3a, red ‘intrinsic’ risk line), 
meaning that intrinsic cancer risk should be determined by the cancer 
incidence for those cancers with the least risk in the entire group con- 
trolling for total stem-cell divisions (Fig. 3a, red dots). The argument 
here is that cancers with the same number of stem-cell divisions should 
share the same base of intrinsic cancer risk (if the relationship is causal); 
if one or more cancers would feature a much higher cancer incidence, 
for example, lung cancer among smokers versus non-smokers, then 
this probably reflects additional (and probably extrinsic) risk factors 
(smoking in this case). One could argue that the low-incidence tumour 
types may have lower incidences because of additional genetic repair 
mechanisms that restrict evolving malignant cells from accumulat- 
ing sufficient numbers of genetic alterations required to become fully 
tumorigenic; however, without more specific data on the operation of 
repair mechanisms, these could drive the risk up or down, depend- 
ing on whether they are less or more efficient in any particular tissue. 
According to our hypothesis, intrinsic risk from stem-cell divisions 
would define the lowest bound for a given number of stem-cell divi- 
sions, therefore we define an ‘intrinsic risk line for stem-cell divisions 
by regressing the smallest cancer risks on any given number of stem- 
cell divisions (Fig. 3a, red line). The ‘intrinsic risk lines themselves are 
still probably overestimates for the intrinsic risk; however, we should 
suspect that any cancer risk above that line implies additional biologic 
determinants, on the basis of which we can compute the percentage of 
cancer risk not explained by intrinsic ‘randomness. As shown in Fig. 3a, 
most cancer types have very high excess risks relative to the ‘intrinsic 
risk line, indicating large proportions of risks that are unaccounted for 
by the intrinsic factors, typically larger than 90%. Moreover, these esti- 
mated excess risks are very robust: with plausible measurement errors 
added to the total stem-cell divisions, the resulting excess risks remain 
essentially intact (Extended Data Table 1). 


Extrinsic risks by tissue cell turnover 

Although we performed the initial analysis from a ‘stem-cell theory’ 
point of view, we wanted to evaluate if our results are dependent on 
this specific theory or independent of it. Furthermore, the lack of reli- 
able data on human tissue stem-cell dynamics is a notable concern 
(see Supplementary Information), rendering the analysis in Fig. 3a less 
determinate. Thus, we separately collected data for the total number 
of tissue cell divisions that is based on homeostatic tissue cell numbers 
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and their turnover rates (see Supplementary Information), and ana- 
lysed the relationship of cancer risk versus total tissue cell divisions 
(Fig. 3b). This approach allows for every dividing cell to be a poten- 
tial cancer-initiating cell, which would be an application of another 
cell-of-origin theory of cancer whereby tumours may originate from 
a hierarchy of cells, from stem cells to committed progenitor cells to 
differentiated cells*. Mathematically, this can also be considered as an 
extreme form of stem-cell theory where the fraction of stem cells is 1 
(this latter formulation then provides an upper bound of the effects 
of the size of the stem-cell population on cancer risk and the role of 
extrinsic factors). The regression analysis between cancer risk and 
total tissue cell division shows a high correlation of 0.75, establishing 
a strong quantitative relationship between cancer risk and total cell 
division. To dissect the extrinsic versus intrinsic risks, we applied the 
same rationale and regressed the smallest cancer risks on any given 
number of cell divisions (Fig. 3b, red line). Although we could only find 
reliable turnover data for a subset of tissues, it is remarkable that the 
conclusion drawn here is nearly identical to that in Fig. 3a; that is, large 
proportions of risks that may not be attributable to intrinsic factors are 
mostly higher than 90%. It is important to note that here we included 
breast and prostate cancers—two high-incidence cancers missing in the 
original stem-cell analysis’. Again, plausible measurement errors have 
been added to the total cell divisions, and the excess risks remained 
almost identical (Extended Data Table 1). In summary, irrespective of 
whether a subpopulation or all dividing cells contribute to cancer, these 
results indicate that intrinsic factors do not play a major causal role. 


Epidemiological evidence 

In parallel, numerous epidemiological studies have established 
strong evidence that many cancers have substantial risk proportions 
attributed to environmental exposures (Extended Data Table 2). 
Particularly, for breast and prostate cancers, it has long been observed 
that large international geographical variations exist in their inci- 
dence rates (for example, Western Europe has the highest incidence 
of breast cancer, which is almost 5 times higher than areas such as 
Eastern Asia or Middle Africa; Australia/New Zealand has the highest 
incidence of prostate cancer, which is almost 25 times higher than 
areas such as South-Central Asia)'*, and immigrants moving from 
countries with lower cancer incidence to countries with higher cancer 
rates soon acquire the higher risk of their new country'*'°. While 
several risk factors have been identified for these cancers, no sin- 
gle one can account for their substantial extrinsic risk proportions, 
suggesting complex mechanisms for their aetiologies. Colorectal 
cancer is a high-incidence cancer that is widely considered to be an 
environmental disease’’, with an estimated 75% or more of colorec- 
tal cancer risk attributable to diet!®. For many other cancers, known 
environmental risk factors have also been identified. For example, 
for melanoma the risk ascribed to sun exposure is around 65-86%!°, 
and for non-melanoma basal and squamous skin cancers ~90% is 
attributable to UV radiation”®. At least 75% of oesophageal cancer, 
or head and neck cancer, is caused by tobacco and alcohol*!”. It 
is also well known that certain pathogens may markedly increase 
the risk of cancers. For instance, human papilloma virus may cause 
~90% of cervical cancer cases”*, ~90% of anal cancer cases”* and 
~70% of oropharyngeal cancer cases”; hepatitis B and C may account 
for ~80% of hepatocellular carcinoma cases”®; and Helicobacter pylori 
may be responsible for 65-80% of gastric cancer cases”’. These, along 
with many other reports, provide direct evidence that environmental 
factors play important roles in cancer incidence and they are modifi- 
able through lifestyle changes and/or vaccinations. 

Additionally, analyses of data from the Surveillance, Epidemiology, 
and End Results Program (SEER) in the USA between 1973-2012 
demonstrate that while many cancers have declining or maintain rel- 
atively consistent age-adjusted incidence rates (for example, cervical, 
gallbladder and oesophageal cancers, Extended Data Fig. 1), incidences 
of some cancers (including melanoma, thyroid, kidney, liver, thymus, 
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Figure 3 | Estimation of the proportion of lifetime cancer risk that is not 
due entirely to ‘bad luck. a, b, Estimations based on total tissue stem-cell 
divisions originally reported in Tomasetti & Vogelstein? (a) and total tissue 
cell divisions (b). Red dots are cancers used to compute the ‘intrinsic’ risk 


small intestine, extranodal non-Hodgkin lymphoma, testicular, anal 
and anorectal cancers) have been steadily increasing, and their cur- 
rent incidences are substantially higher than their historical minima 
in the past 40 years** (Extended Data Fig. 1). Moreover, the mortality 
trend of lung cancer from 1930-2011 (ref. 29), which usually mirrors 
its incidence trend, shows a more than 15-fold increase for lung cancer 
risk. These substantial increases in incidence suggest that large risk 
proportions are attributable to changing environments (for example, 
smoking and air pollutants and their role in the risk of developing lung 
cancer). Collectively, nearly all major cancers have been covered in 
these epidemiological studies, further supporting the hypothesis of sub- 
stantial extrinsic risks for most cancers. Notably, most of these cancers 
from the epidemiological and SEER results, except for small intestine, 
are located above the red ‘intrinsic’ risk lines in Fig. 3a, b (blue points). 
Accounting for the external factors would move them closer to the 
proposed ‘intrinsic line, further supporting the conjecture that the 
intrinsic line is mainly defined by cancers without compelling known 
epidemiological risk, whereas those above are at higher risks owing to 
extrinsic factors. 


linear regression lines (red dashed lines). Blue dots are cancers known to 
have substantial extrinsic risks from epidemiology studies. The numbers 
in parentheses are the estimated percentages of cancer risks that are due to 
factors other than intrinsic risks. 


Analysis of mutational signatures 

In addition to epidemiological studies, we evaluated recent studies on 
mutational signatures in cancer. These are regarded as ‘fingerprints’ 
left on cancer genomes by different mutagenic processes*, revealing 
~30 distinct signatures among various cancers*!. Analysis of these sig- 
natures was therefore used to shed light on the proportion of intrinsic 
versus extrinsic origins of cancer. Two signature mutations, 1A/1B 
(see ref. 31), demonstrated strong positive correlations with age in the 
majority of cancers, suggesting that they are acquired at a relatively 
constant rate over the lifetime of cancer patients and thus probably 
result from intrinsic processes; however, all other signature mutations 
(~30) lack the consistent correlations with age, suggesting that they are 
acquired at different rates in life and thus are probably a consequence of 
extrinsic carcinogen exposures”. Indeed, several mutational signatures 
have been linked to known factors such as UV radiation and smoking”. 
We therefore categorized the signatures into intrinsic (type 1A/1B) 
and extrinsic mutations with known or unknown factors, and sum- 
marized their corresponding percentages in Extended Data Table 3. 
Notably, many cancers have substantial extrinsic mutations with 
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Figure 4 | Theoretical lifetime intrinsic risks (tLIR) for cancers based 
on different number of hits (k) required for cancer onset. a, b, The green 
(a) and blue (b) dashed lines are the ‘intrinsic’ risk lines estimated on the 


basis of total reported stem-cell numbers and total homeostatic tissue cells, 
respectively. The intrinsic stem-cell mutation rate (r) is assumed to be 


known factors. More importantly, cancers known to have substantial 
environmental risk proportions, for example, breast cancer’>, prostate 
cancer!®, colorectal cancer!*, melanoma!’, head and neck cancer?!, 
oesophageal cancer’, cervical cancer”’, liver cancer?® and stomach 
cancer”’, all harbour large percentages of total extrinsic mutational 
signatures. This suggests that the percentages of total extrinsic muta- 
tional signatures can serve as a good surrogate for extrinsic cancer 
risks. While a few cancers have relatively large proportions of intrinsic 
mutations (>50%), the majority of cancers have large proportions of 
extrinsic mutations, for example, ~100% for myeloma, lung and thy- 
roid cancers and ~80-90% for bladder, colorectal and uterine cancers, 
indicating substantial contributions of carcinogen exposures in the 
development of most cancers. 


Modelling theoretical lifetime intrinsic risk 

Finally, in another independent model-driven approach to dissect- 
ing the risk contribution of the intrinsic processes, we modelled the 
potential lifetime cancer risk due to intrinsic stem-cell mutation 
errors by varying the number of hits (that is, driver gene mutations), 
denoted by k, required for cancer onset. We derived the probability 
distribution of the propagation of driver gene mutations from one 
generation to the next, and subsequently established the theoretical 
relationship between cell divisions and the degree of lifetime cancer 
risk due to intrinsic cell mutation errors alone, which we refer to as 
the theoretical lifetime intrinsic risk (tLIR). To overcome the limita- 
tion of inaccurate estimation in the reported stem-cell numbers’, we 
calculated tLIR using both the reported stem-cell number (tLIRsc) 
and the total tissue cell number (tLIRtt). The latter is equivalent to 
assuming all homeostatic tissue cells to be stem cells, representing an 
extreme overestimation of tissue stem cells, which consequently leads 
to a conservative estimation of the upper bounds in tLIR. The somatic 
mutation rate in tumours is estimated to be 5 x 10~'° per nucleo- 
tide site per cell division*?~**. On this basis, in our initial calculation 
we used an intrinsic mutation rate (r) of 1 x 10~° per cell division, 
which is equivalent to approximately 20 mutable nucleotide sites for 
each driver gene where the driver gene will mutate if at least one site 
mutates. As shown in Fig. 4a, b, if only one hit (that is, mutation of 
one designated driver gene) is required to develop cancer—that is, 
k =1—the lifetime risk for almost all cancers is close to 100%. This 
confirms that one mutation is not enough for cancer onset (other- 
wise everyone would theoretically acquire each type of cancer). If 
two driver gene mutations are needed, k = 2, the modelled intrinsic 
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1 x 10~ per cell division. The red dashed lines are the ‘intrinsic’ risk lines 
estimated on the basis of the observed data using the same mechanism 

as Fig. 3a. Adjusted (adj.) basal and adjusted melanoma represent cancer 
risks after adjusting for the effect of sun exposure and UV radiation. AML, 
acute myeloid leukaemia. 


risk becomes small for cancers with a small total number of stem-cell 
divisions; however, it is still very large for those with higher stem-cell 
divisions, and even unreasonably large for some cancers by surpass- 
ing the corresponding observed total lifetime cancer risks (adjusted 
basal cell carcinoma, colon adenocarcinoma, adjusted melanoma, 
small intestine cancer, acute myeloid leukaemia and duodenal cancer; 
Fig. 4a). It is therefore unlikely that, at least in these cancers, two hits 
will suffice to induce cancer. As shown in Fig. 4, if we consider the 
more reasonable case where three mutations are required**, k =3, 
almost all modelled intrinsic risks (both tLIRsc and tLIRtt) drop 
well below our earlier ‘intrinsic risk lines estimated conservatively 
from the observed data alone (red dashed lines, estimated based on 
observed data following the same mechanism as Fig. 3a). The life- 
time risk drops even further for k= 4 and beyond. The extrinsic risks 
based on the tLIRsc and tLIRtt are further summarized in Extended 
Data Table 4. This modelling approach demonstrates that cancer risk 
due to intrinsic stem-cell mutation errors alone is low for almost all 
cancers that require over two mutations, indeed it is lower than the 
relatively conservative estimate based on data alone (red lines, Fig. 4). 
As the driver gene mutation rate in stem-cell division is a key para- 
meter, we further conducted sensitivity analyses with different rates 
(r=1x 107! to 1 x 10~°) to examine how this may affect the tLIR 
(Extended Data Figs 2 and 3). The results show that for k= 3, when 
r<1x 107’ (~200 sites for each driver gene hit), almost all mod- 
elled intrinsic risks are below the observed ‘intrinsic risk line (red 
lines); when r= 1 x 107° (~2,000 sites for each driver gene hit), the 
majority of modelled intrinsic risks are still well below the observed 
‘intrinsic’ risk lines, particularly those with small total number of 
divisions (Extended Data Fig. 2). For k=4, when r<1 x 10~°, almost 
all modelled intrinsic risks are below the observed ‘intrinsic’ risk lines 
estimated through the data-driven approach (Extended Data Fig. 3). 
These sensitivity analyses demonstrate that our conclusions are highly 
robust, and that the attribution of intrinsic mutations to lifetime can- 
cer risk through stem-cell divisions, particularly for those cancers 
with low risk, is rather small, even using widely different intrinsic 
mutation rates. 

In summary, we find that a simple regression analysis cannot 
distinguish between intrinsic and extrinsic factors. We have pro- 
vided a new framework to quantify the lifetime cancer risks from 
both intrinsic and extrinsic factors on the basis of four independent 
approaches that are data-driven and model-driven, with and without 
using the stem-cell estimations. Importantly, these four approaches 
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provide a consistent estimate of contribution of extrinsic factors of 
>70-90% in most common cancer types. This is consistent with the 
overall conclusion regarding the role of extrinsic factors in cancer 
development. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded to allocation 
during experiments and outcome assessment. 

Derivation of the probability of possessing k hits after n cell divisions for one 
cell. On the basis of the theory of the clonal stem-cell origin of cancer, in a given 
tissue the stem cell would first go through m rounds of symmetric divisions (for 
each division, each stem cell would divide into two daughter stem cells) to reach 
a total of S stem cells (S = 2”) at the steady state. Subsequently, these S stem cells 
would go through a rounds of asymmetric divisions (for each division, each stem 
cell would yield only one daughter stem cell) throughout the lifetime of the tissue. 
This means that the total rounds of lifetime stem-cell divisions per generation 
is n=m +a. Information on the total rounds of symmetric and asymmetric 
divisions as well as the total number of stem cells in the steady state for various 
tissues discussed in this work has been extracted from supplementary table 1 
of Tomasetti & Vogelstein®. With k hits (mutations of k predetermined driver 
genes) on a stem cell required for cancer onset, the number of possible cell states 
of a given stem-cell generation would be k + 1, including a zero state with no 
hit. If we assume that once a hit occurs it cannot be reversed and therefore be 
carried to all progeny cells, then a cell state may only transition from lower to 
higher or equal levels from generation to generation. In Extended Data Fig. 4, we 
demonstrate with k= 3 the state transitions of accumulating driver gene muta- 
tions. Let X, denote the number of driver gene mutations accumulated at gen- 
eration g, and r be the intrinsic driver gene mutation rate due to random errors 
during DNA replication; the transition probabilities to generation g+ 1 with 
i mutations from the previous generation g with j <i mutations are derived as 
follows: 


i. : ja — 1) P(X, =;) 


j=0 


In particular, for the emission state i=0: 


P(Xg41=0) =(1—r)‘ P(X, =0) 


For the absorbing state i= k: 
k : 
P(Xg41 =k) = Yo r* P(X =) 
j=0 


Based on these, the computing algorithm is derived as follows: 
Set the initial cell state at generation 0: 


P(Xy =0) 


1; P(Xp=1) =0;...;P(Xo =k) =0 


For g=1,...,n and 0 <i<k, we compute the following probabilities iteratively: 


k-j 
oy 


ees 


j=0 


pra —r)#P(X~-1=)) 


where 7 is the total number of divisions that one stem cell may experience during 
its lifetime. 


Derivation of the theoretical lifetime intrinsic risk (tLIR) of cancer for a given 
tissue. As mentioned previously, we assume stem cells in a specific tissue undergo 
two phases of divisions (Extended Data Fig. 5): (1) a total of m symmetric divisions 
before full tissue development, and (2) a total of a asymmetric divisions for nor- 
mal tissue turnovers. So in a fully developed tissue, there is a total of S=2” stem 
cells. For each stem cell, the probability of possessing all k hits for cancer onset 
after n= m-+-a rounds of divisions is P(X, =k), which can be calculated from the 
previous part. Therefore, the theoretical lifetime intrinsic risk (tLIR) of developing 
cancer—that is, the probability of at least one stem cell containing k hits during its 
lifetime—can be expressed as: 


tLIR=1—[1— P(X,=k)] 


Estimating cancer risk for different tissues. The rounds of symmetric and asym- 
metric divisions for different tissues were adopted from supplementary table 1 of 
Tomasetti & Vogelstein. In particular, the rounds of symmetric divisions, m, is 
equal to the integer part of log»S, where S is the number of normal stem cells in the 
tissue of origin (data from ref. 5), and the rounds of asymmetric divisions a was 
the column labelled ‘@ in supplementary table 1 of ref. 5. Sensitivity analyses have 
been conducted for scenarios with a broad range of mutation rates, from 1 x 10718 
to 1 x 10~®, and several required hits (k= 1, 2, 3, 4). 
Lower-bound estimates of extrinsic risks with the SEER data. As a program of 
the National Cancer Institute (NCI), SEER (Surveillance, Epidemiology, and End 
Results Program) is a source of information on cancer incidence and survival in the 
USA (http://seer.cancer.gov/). The age-adjusted cancer incidences were extracted 
from the database ‘SEER 9 Regs Research Data, Nov 2014 Sub (1973-2012) 
<Katrina/Rita Population Adjustment>’ using the SEER*Stat 8.2.1 (ref. 28). For 
several cancers, it has been observed that their incidence rates have increased 
markedly during the past 40 years (Extended Data Fig. 1). For these cancers, it 
is reasonable to assume that anything above the historical minimum incidence 
should be attributed to some environmental/extrinsic factors. Therefore, we can 
establish the following inequality: 

Extrinsic risk > (1 — historical minimum incidence rate/incidence rate in 2012). 

Correspondingly, the lower bounds of contributions by extrinsic factors for 
these cancers can be calculated. As shown in Extended Data Fig. 1, some cancers 
show substantial contributions from extrinsic factors. 
Data and statistical analysis. The observed lifetime cancer risks and the cumu- 
lative number of divisions (1) of all stem cells per lifetime are adopted from sup- 
plementary table 1 of Tomasetti & Vogelstein®. The total tissue cell divisions are 
from our evaluation of the data (Supplementary Information). For the robustness 
analysis of Fig. 3 as shown in Extended Data Table 1, error terms following the 
normal distribution with mean 0 and standard deviations of 1 or 0.4 were added 
to the logio(total stem-cell division) or log9(total cell division). These allow the 
number of total stem-cell and cell divisions to vary approximately within a range 
of ~1/100-100-fold or ~1/5-5-fold, respectively. On the basis of the new data 
set with measurement errors, the excess risks for each cancer were quantified. 
This process is repeated 1,000 times, and from this the mean, the 2.5 and the 97.5 
percentiles (namely the 95% confidence intervals) of the excess risk for each cancer 
are tabulated. In calculating the percentage of intrinsic versus extrinsic mutations 
based on mutational signatures from cancer genome, we define the intrinsic muta- 
tion as those with signatures 1A/1B, and extrinsic mutation as all other mutational 
signatures (2-21, R1-R3, U1 and U2). The corresponding data were obtained from 
supplementary figures 59-88 of ref. 31. All statistical analyses and mathematical 
calculations were performed using R (version 3.1.2). 
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Extended Data Figure 3 | Sensitivity analysis of different mutation 
rates on tLIR when the number of hits (k) required is 4. a, b, Theoretical 
intrinsic lifetime risks (tLIR) for cancers have been calculated based on 
five different mutation rates: r=1 x 107!°,1 x 10-°, 1 x 10-8, 1 x 107”, 

1 x 10~°. The red dashed lines are the ‘intrinsic’ risk lines based on the 
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observed data following the same estimation mechanism as the intrinsic 
risk line in Fig. 3a. The green (a) and blue (b) dashed lines are the 
‘intrinsic risk lines estimated based on total reported stem-cell numbers 
and total homeostatic tissue cells, respectively. 
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Extended Data Figure 4 | Intrinsic cancer risk modelling. Part 1 of 2: propagation diagram of driver gene mutation states between generations in one 
stem cell, from which the stem-cell mutation transition probabilities from one generation to the next are computed. 
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theoretical lifetime intrinsic risks (tLIR) for cancer due to k driver gene shows a cancer occurrence as the second stem cell in the last generation 
mutations are computed. Each coloured circle represents the mutation (generation m) that has accumulated all 3 driver gene mutations. 


of a new driver gene in the given stem cell (yellow, first mutation; green, 
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Extended Data Table 1 | Robustness analysis on total stem-cell divisions and cell divisions estimates in Fig. 3 


Total stem-cell divisions (Fig. 3A) Total cell divisions (Fig. 3B) 


Observed] Log10 | Excess Excess risk Excess risk 
Risk |(divisions)| risk 95% Cl” 95% Cl" 


asal cell 
reast 


OAD 
AP COAD 
ynch COAD 
Duodenum’* 
‘AP Duodenum 
sophageal 
allbladder N 


Lung (nonsmoker' | 0.0045 | 9.97 [>0938 [oas5,ose| 152 | - | | 
Medulloblastoma’ [0.00011] 8.43, | - |  - =| NA | NA | NA | 
rms csteosarcoma’™ [aooeos| 66 | - | - | aos9 | - | | 
Head osteosarcoma’ |3.02E-05| 6.78 | - | - | 1141 | - [| - | 
Pelvis osteosarcoma’ |3.00E-05] 6.50 | NA | NA | 1081 | - [| - 
Pancreaticislet! [0.000194] 9.78 | - |  - | NA | NA [NA _ | 

| i422 [| - [| - | 


ual 
= < 
r 

= 


Measurement errors were added to logio(divisions) and 1,000 simulations were carried out to calculate the mean and 95% confidence interval (Cl) of the excess risks. See Methods for details. NA: data 
not available. 

*Confidence interval. 

tCancers used to compute the ‘intrinsic’ risk line based on total stem-cell divisions. 

{Cancers used to compute the ‘intrinsic’ risk line based on total cell divisions. 
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Extended Data Table 2 | Epidemiological studies on the extrinsic risks of various cancers 


Cancer Types Examples of potential extrinsic risk factors. 
Oral contraceptive, hormone replacement thera 

Breast substantial . . oe . p ; PY 
lifestyle (diet, smoking, alcohol, weight) 

Prostate substantial | Diet, obesity, smoking 


Lung Smoking; air pollutant 
Colorectal Diet, smoking, alcohol, obesity 


Melanoma 
Basal cell 
Hepatocellular 
Gastirc 
Cervical 
Head & Neck 
Esophageal Smoking, alcohol, obesity, diet 

Oropharyngeal 
Thyroid 
Kidney 
Thymus 
Small intestine 


Ext dal -Hodgkin' 
alee eee cae >71% Chemicals, radiation, immune system deficiency 
lymphoma (NHL) 


Testis Largely unclear 


Anal and tal 
cancers 


*http://www.cancer.org/cancer. 
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Extended Data Table 3 | Percentages of intrinsic versus extrinsic MS with known and unknown causes in different cancer types 


Intrinsic Extrinsic MS - 
Ms Known Unknown Total 
ool ae 


Glioblastoma 


9.2 
24.9 
[Kidney Papillary | | 5.7 | 
10.9 
9.1 


17.1 


Lymphoma B-cell 
Medulloblastoma | 4.4] | 


7.2 
[Myeloma | 9.9 | 
36.6 


Thyroid 


Intrinsic mutational signatures (MS) includes signatures 1A/B, and extrinsic MS includes signatures 2-21, R1-R3, U1 and U2, excluding signature 11 for Temozolomide, an alkylating agent used for 
chemotherapy. The blue, yellow and red colours highlight cancers that are have substantial extrinsic risk proportions based on epidemiological data, MS with known causes and MS with unknown 
causes, respectively. Data from the supplementary figs 59-88 in ref. 31. 
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Extended Data Table 4 | Percentages of extrinsic risks based on the reported stem-cell estimates and total homeostatic tissue cells, as 
shown in Fig. 4 


Cancer Type k=3 
M 
Basal cell 1.000| 1.000 
7 1.000 0.578 | 1.000 
COAD 0.928 | 1.000 
FAP COAD 0.997 | 1.000 
Lynch COAD 1.000 0.993 | 1.000 


q 
S 


> 
— 
2 
a) 
=) 


fe) 
a 
= 
ray 


el P| 
o/° 
o/S 
o|S 


hb 
j=) 
j=) 
o 


Duodenum 0.986 | 1.000 
FAP Duodenum 1.000 | 1.000 
Esophageal 0.997 | 1.000 
Gallbladder TO. ; 1.000 | 1.000 
lioblastoma 
Head & neck 1.000/1. 
H.T.O. 
1.000 1.000 
Osteosarcoma __|_H..0._|_1.000_|1.000| 1.000 1.000 
1.000 
1.000 


Pe ee i || 
Slo]/el/oI]/o!o 
SIS|S/IS/S/S 
SIS|S/S|S]|S 


1.000 
Pelvis osteosarcoma| HO. | 1.000 |1.000|1.00 1.000 
1.000 
1,000 
0.611_|1.000]1.000 1.000 
.998| 1.00 1,000 
1.000 


Extrinsic risk= 1 —(tLIRsc or tLIRtt)/observed risk. H.T.O., higher than the observed. 
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SMN and symmetric arginine 
dimethylation of RNA polymerase II 
C-terminal domain control termination 


Dorothy Yanling Zhao!*?, Gerald Gish?*, Ulrich Braunschweig", Yue Li!*, Zuyao Ni!, Frank W. Schmitges!, Guoqing Zhong!, 
Ke Liu®, Weiguo Li°, Jason Moffat!*, Masoud Vedadi°, Jinrong Min®, Tony J. Pawson*?, Benjamin J. Blencowe!? & 


Jack F. Greenblatt!? 


The carboxy-terminal domain (CTD) of the RNA polymerase II (RNAP II) subunit POLR24A is a platform for modifications 
specifying the recruitment of factors that regulate transcription, mRNA processing, and chromatin remodelling. Here we 
show that a CTD arginine residue (R1810 in human) that is conserved across vertebrates is symmetrically dimethylated 
(me2s). This R1810me2s modification requires protein arginine methyltransferase 5 (PRMT5) and recruits the Tudor 
domain of the survival of motor neuron (SMN, also known as GEMINI1) protein, which is mutated in spinal muscular 
atrophy. SMN interacts with senataxin, which is sometimes mutated in ataxia oculomotor apraxia type 2 and amyotrophic 
lateral sclerosis. Because POLR2A R1810me2s and SMN, like senataxin, are required for resolving RNA-DNA hybrids 
created by RNA polymerase II that form R-loops in transcription termination regions, we propose that R1810me2s, SMN, 
and senataxin are components of an R-loop resolution pathway. Defects in this pathway can influence transcription 
termination and may contribute to neurodegenerative disorders. 


The CTD of POLR2A contains 52 heptapeptide repeats. The amino- 
terminal half of the CTD comprises repeats that mostly conform to 
the consensus Tyr1-Ser2-Pro3-Thr4—Ser5-Pro6-Ser7, whereas the 
C-terminal half consists of repeats that generally deviate from this 
consensus'. These repeats can be phosphorylated on Tyrl, Thr4, and all 
three serine residues, and specific CTD phosphorylation patterns are 
important for various aspects of chromatin regulation, transcription, 
and co-transcriptional RNA processing”°. Two non-consensus human 
CTD arginine residues, R1603 and R1810, are conserved in vertebrates. 
It was found recently that asymmetric dimethylation (me2a) of R1810 
by the CARM1 (also known as PRMT4) methyltransferase inhibits the 
expression of small nuclear RNA (snRNA) and small nucleolar RNA 
(snoRNA) genes in human cells’. It was also shown that this R1810 
me2a mark can be bound in vitro by the Tudor domain of TDRD3 
(ref. 7). At the c-Myc promoter, asymmetric histone arginine dimethyl- 
ation by PRMT1 and CARMI recruits TDRD3 and TOP3B to suppress 
R-loop accumulation’. We now show that R1810 can be symmetri- 
cally dimethylated, a modification requiring PRMT5. This R1810me2s 
modification recruits SMN, which then interacts with senataxin, a 
helicase needed for resolving R-loops in transcriptional termination 
regions. 


R1810me2s on the RNAP II CTD 

Motivated initially by the question of whether the R1810me2a modi- 
fication is recognized by TDRD3 in vivo, we performed immunopre- 
cipitation analysis using tagged TDRD3 and the RNAP II POLR2D 
subunit. We observed that both tagged proteins could co-immuno- 
precipitate me2a-modified POLR2A, as detected by western blotting 
with the ASYMM24 antibody specific for Arg-me2a. In contrast, 
only POLR2D, and not TDRD3, co-immunoprecipitated a form 


of POLR2A with an me2s modification that could be detected by 
SYMM10 or Y12 antibodies specific for Arg-me2s (Extended Data 
Fig. la). To determine whether R1810 is indeed dimethylated sym- 
metrically, as well as asymmetrically, we generated polyclonal anti- 
bodies against an R1810me2s-containing 7-mer peptide and found 
that immunoprecipitated POLR2A is recognized by this R1810me2s 
antibody (Fig. 1a). To determine whether the Arg-me2s modification 
involved R1810, Raji cells stably expressing a-amanitin-resistant, 
HA-tagged, wild-type or R1810A mutant POLR2A were generated’. 
After treatment with a-amanitin to deplete endogenous a-amanitin- 
sensitive RNAP I, followed by immunoprecipitation of RNAP II 
with anti-HA antibody, western blotting with antibodies recognizing 
R1810me2a’ or R1810me2s, as well as with Y12 antibody, revealed 
that the R1810A mutation results in the loss of both R1810me2a and 
R1810me2s modifications (Fig. 1b and Extended Data Fig. 1b). The 
precipitated RNAP II was dephosphorylated before western blotting 
to enable more sensitive detection of R1810me2s (Extended Data 
Fig. 1c). The Y12 and R1810mez2s antibodies recognize an Arg-me2s 
peptide bracketing CTD R1810 much better than peptides with no 
arginine modification or R1810mez2a (see slot blots of Extended 
Data Fig. 1d). Therefore, R1810 is symmetrically dimethylated in cell 
extracts. Extended Data Fig. 1a also shows that TDRD3 recognizes 
R1810mez2a in cell extracts, as well as in vitro’, although TDRD3 does 
not mediate inhibition of snRNA and snoRNA gene expression by 
R1810me2a’. 


R1810me2s modification requires PRMT5 

Although PRMT9 can symmetrically dimethylate certain substrates, 
PRMT5 is the predominant methyltransferase that symmetrically 
dimethylates arginine in human cells’®''. As the PRMT5-WDR77 
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Figure 1 | Symmetric dimethylation of R1810 on the RNAP II CTD 
and a requirement for PRMT5. a-d, Western blots, with relative 
quantifications underneath (in b-d), of immunoprecipitations (IP) from 
HEK293 (a, ¢, d) or Raji (b) whole-cell lysates (WCL) after stable (sh) 

or transient (small interfering RNA (si)) knockdowns (c, d). Anti- HA 

was used to precipitate HA-tagged POLR2A in b. e, f, Quantification 

of methylation of GST-CTD fusion proteins shown in the Coomassie- 
stained SDS gel of the insert (e), or 13-mer CTD peptides (f) with various 
concentrations of recombinant PRMT5-WDR77. Error bars denote s.e.m. 
(n=3 biological replicates). 


complex associates with RNAP II through the CTD phosphatase 
FCP1 (ref. 12), which is consistent with our observation that PRMT5 
co-purifies with RNAP II (Fig. 1a), we tested whether PRMT5 might 
be needed to symmetrically dimethylate R1810. HEK293 cell lines 
stably expressing shRNAs for CARM1, PRMT5, and GFP were gen- 
erated, and endogenous RNAP II was precipitated. Western blotting 
revealed that CARM1 knockdown causes loss of the R1810me2a, but 
not the R1810me2s mark on POLR2A, whereas PRMT5 knockdown 
causes loss of the R1810me2s, but not the R1810me2a mark (Fig. 1c 
and Extended Data Fig. le, f). Transient siRNA-mediated knock- 
down of PRMT5 consistently also reduced R1810me2s, whereas 
overexpression of Flag-tagged PRMT5 increased R1810me2s (Fig. 1d 
and Extended Data Fig. 1g). These experiments indicated that 
PRMTS5 is required in vivo for R1810me2s modification of the 
RNAP II CTD. 

PRMTS5 is the catalytic subunit of the methylosome, which also 
contains WDR77 (also known as MEP50)!°". To test whether 
PRMT5 can directly methylate CTD arginine residues, we incubated 
recombinant PRMT5-WDR77 with tritiated S-adenosyl methio- 
nine ([7H]SAM) and recombinant GST-N-CTD, which contains 
CTD repeats 1-29 and includes R1603, or GST-C-CTD, which con- 
tains CTD repeats 24-52 and includes R1810 (GST was linked to 
the N-terminus of the CTD fragment in both cases). Scintillation 
counting was then used to monitor [*H]SAM labelling following 
glutathione-agarose pull-down of the GST fusion proteins. Both 
GST-C-CTD and GST-N-CTD were methylated above the back- 
ground (GST alone, which contains 14 arginines) (Fig. le). When the 
PRMT5 methylation assays were repeated with biotinylated 13-mer 
peptides containing R1810 or R1603, methylation was again observed 
on both R1603 and R1810 (Fig. 1f). Therefore, PRMT5-WDR77 is 
needed to symmetrically dimethylate R1810 in vivo and can directly 
methylate R1603 and R1810 in vitro, but may require an additional 
co-factor’? and/or appropriate CTD phosphorylation to specifically 
methylate R1810. 
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Figure 2 | R1810me2s in the RNAP II CTD is recognized by SMN. 

a, Fluorescence polarization assays showing binding of SMN to 
FITC-labelled CTD peptides containing R1810. b, Isothermal titration 
calorimetry assays showing binding of SMN to a CTD peptide containing 
R1810me2s. c, d, Western blots, with relative quantifications underneath, 
of immunoprecipitations (IP) from HEK293 (d) or Raji (c) whole-cell 
lysates after various stable (sh) knockdowns (d). Anti-HA was used to 
precipitate HA-tagged POLR2A in c. e-g, Quantification of quantitative 
(qPCR) data from ChIP experiments at the indicated ACTB primer 
positions in HEK293 (e, f) or Raji (g) cells, along with effects of knocking 
down (KD) PRMT5 or SMN (f) or mutating R1810 to alanine (g). Error 
bars denote s.e.m. (n= 3 biological replicates). 


Recognition of R1810me2s by SMN 

Modified CTD residues and dimethylated arginine residues usually 
mediate their biological effects via interacting proteins, in the latter case, 
Tudor domain proteins'>!°. Therefore, we sought a protein that recog- 
nizes R1810me2s. Because the Tudor domains of SMN and SPF30 (also 
known as SMNDC1), as well as those of TDRD1, TDRD2, TDRD9, 
and TDRD11, specifically bind Arg-me2s'°, we used them in fluores- 
cence polarization assays to identify whether any could bind an FITC- 
tagged 13-mer CTD peptide containing R1810me2s. Of these, only the 
Tudor domain of SMN exhibited binding, with much lower affinity 
for R1810me2a and R1603me2s peptides and no detectable affinity 
for the unmodified peptides (Fig. 2a and Extended Data Fig. 2a, c). 
In contrast, the TDRD3 Tudor domain showed weak affinity only for 
R1810me2a > R1603mez2a above background (no modification or 
Arg-me2s), consistent with published data (data not shown)”!”. 
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Figure 3 | R1810me2s and SMN recruit senataxin to RNAP II. 

a-e, Western blots, with relative quantification underneath, of 
immunoprecipitations (IP) from HEK293 whole-cell lysates after stable 
(sh) knockdowns of the indicated proteins (c-e). Cells were grown for 

3 days in the presence of «-amanitin in the right panel of b to eliminate 
endogenous POLR2A (western blot of left panel). f-h, Quantification 

of qPCR data from ChIP experiments at the indicated ACTB primer 
positions (see Fig. 2) in HEK293 (f, h) or Raji (g) cells, along with effects of 
knocking down (KD) PRMT5, SMN or senataxin (h) or mutating R1810 to 
alanine (g). Error bars denote s.e.m. (n = 3 biological replicates). 


Compared to a CTD peptide with R1810me2s alone, the presence of 
additional phospho-Tyr1 or phospho-Ser2 modifications, or both, on 
the peptide only slightly enhanced its binding to SMN in fluorescence 
polarization assays and had no significant effect in isothermal titra- 
tion calorimetry assays (Fig. 2b and Extended Data Fig. 2b, d). Other 
phosphorylations near R1810 also had no effect on SMN binding 
in vitro (data not shown), indicating that the association of SMN with 
R1810me2s is not greatly influenced by CTD phosphorylation. 

Using coimmunoprecipitation, we found that SMN and POLR2A 
interact (Fig. la). Consistent with specific recognition of the 
R1810me2s modification by SMN, immunoprecipitation of SMN from 
HEK293 cell extracts co-precipitated endogenous POLR2A with the 
R1810me2s modification (Fig. 1a). To test whether R1810 is important 
for the association of RNAP II with SMN in vivo, HA-tagged wild-type 
or mutant (R1810A) POLR2A was immunoprecipitated with anti-HA 
antibody. Western blotting showed that the R1810A mutation disrupts 
the association of RNAP II with both SMN and TDRD3 (Fig. 2c). As 
expected, immunoprecipitation of endogenous RNAP II from HEK293 
cells expressing shRNAs for GFP, CARM1 or PRMTS5 revealed that 
only knockdown of PRMT5 reduced co-precipitation of SMN (Fig. 2d). 
Similarly, transient siRNA-mediated knockdown of PRMT5 reduced 
the RNAP II-SMN interaction, whereas overexpression of PRMT5 
enhanced it (Extended Data Fig. 3a, c). 

The experiments described above reveal that R1810me2s is impor- 
tant for the SMN-RNAP II interaction. To examine whether SMN 
also displays R1810me2s-dependent association with RNAP I during 
transcription, SMN chromatin immunoprecipitation (ChIP) was per- 
formed. As shown in Fig. 2e and Extended Data Fig. 4a, SMN associ- 
ates with the B-actin (ACTB) and GAPDH genes from their promoter 
regions to their termination regions. Moreover, PRMT5 knockdown 
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with shRNA (Fig. 2f and Extended Data Fig. 4b) or siRNA (not shown) 
strongly reduced the SMN ChIP signals along these genes, as did 
shRNA for SMN (Fig. 2f), showing the specificity of the SMN antibody 
for ChIP. The SMN ChIP signals on ACTB and GAPDH decreased 
upon expression of the POLR2A (R1810A) mutant, compared to wild 
type, after «-amanitin treatment to eliminate endogenous POLR2A 
(Fig. 2g and Extended Data Fig. 4c). These experiments indicated that 
R1810me2s recruits SMN to RNAP II elongation complexes. Consistent 
with our observation that SMN is recruited from promoters to 3’ ends, 
we found that immunoprecipitation of SMN co-precipitated RNAP II 
containing CTD phosphorylation on Ser2 and Ser5, modifications that 
are associated with RNAP II complexes engaged in 5’-end formation 
and elongation, respectively (Extended Data Fig. 3b). SMN ChIP-seq 
showed SMN occupancy only at promoter and termination regions, 
probably because SMN does not bind to DNA directly (Extended Data 
Fig. 3d). 


SMN interacts with senataxin 

To further understand the relationships of R1810me2s and SMN to 
transcription, we noted that SMN is known to interact with senataxin'®, 
a DNA-RNA helicase encoded by the SETX gene that is important for 
termination by RNAP II'*!°. We confirmed that SMN indeed interacts 
with senataxin (Fig. la, 3a, 3b); the interaction is apparently not medi- 
ated by RNAP II, as it persists after 3 days of treatment of cells with 
a-amanitin to eliminate the bulk of the RNAP II (Fig. 3b). Moreover, the 
SMN-senataxin interaction is reduced when PRMT5 is stably knocked 
down, indicating that the interaction is probably mediated by the Tudor 
domain in SMN and an Arg-me2s modification on senataxin (Fig. 3c). 
Senataxin also co-precipitated with HA-tagged, «-amanitin-resistant, 
wild-type RNAP II, and the interaction was reduced by the R1810A 
mutation (Fig. 2c), indicating that the senataxin-RNAP I interaction 
is likely stabilized by SMN. Consistent with this, in cells with stable 
shRNA-mediated or transient siRNA-mediated knockdown of PRMT5 
or SMN, the association of senataxin with RNAP II was reduced 
(Fig. 3d, e and Extended Data Figs 1h-j and 3c). 

Therefore, we used ChIP to test whether SMN and R1810 are 
important for senataxin recruitment to elongating transcription com- 
plexes. Senataxin ChIP in Raji cells depleted of endogenous POLR2A 
and expressing wild-type or R1810A mutant POLR2A revealed that 
R1810 is important for senataxin recruitment throughout the ACTB 
and GAPDH genes, including their termination regions (Fig. 3g and 
Extended Data Fig. 4d, e). SMN or PRMT5 knockdown revealed that 
both are important for senataxin recruitment (Fig. 3h and Extended 
Data Fig. 4f). Knockdown of senataxin itself showed that the senataxin 
antibody used for ChIP was specific for senataxin (Fig. 3h). Because 
senataxin is a termination factor!*!°, these experiments showed that 
POLR2A (R1810), SMN, and PRMT5 should also be important for 
termination by RNAP IL. Consistent with this, XRN2, a 5/3’ exonu- 
clease involved in termination by RNAP II”°, also co-precipitated with 
SMN and RNAP II (Fig. la). 


SMN and R1810me2s regulate termination 

To investigate whether SMN and R1810me2s are important for R-loop 
resolution and termination, we carried out ChIP on a-amanitin-resistant 
wild-type or (R1810A) mutant POLR2A after depleting endogenous 
a-amanitin-sensitive POLR2A. There was enrichment of the R1810A 
mutant over wild-type RNAP II downstream of the cleavage and pol- 
yadenylation sites where RNAP II pauses and terminates transcription 
on ACTB (Fig. 4b, d and Extended Data Fig. 5) and GAPDH (Extended 
Data Fig. 6c, d), as detected by various anti-POLR2A monoclonal anti- 
bodies (4H8, H224, 8WG16). The effect of R1810 on RNAP II accumu- 
lation in termination regions was confirmed by carrying out a variant of 
the global run-on procedure in which nuclei are incubated with BrUTP 
and short run-on RNAs are isolated by binding to anti-BrU antibodies". 
This experiment also indicated that the R1810A mutation leads to 
over-accumulation of active RNAP II downstream of the poly(A) sites 
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Figure 4 | SMN and R1810me2s regulate transcription termination 
by RNAP II. a-d, Quantification of RNAP II qPCR data from ChIP 
experiments using POLR2A antibodies (4H8, 8WG16, N20) (a, b, d) or 
nuclear run-on (c) in HEK293 (a) or Raji (b, c) cells, using the indicated 
ACTB primer positions, after stably knocking down (sh) PRMT5, SMN, 
or senataxin (a), or mutating R1810 to alanine (b). Error bars denote s.e.m. 
(n=5 biological replicates for a, b, d; n =3 biological replicates for c). 

e, RNAP II ChIP-seq in Raji cells (average density of ChIP fragments per 
million in the library for the 5% most highly expressed genes) for SMN 
(purple) versus GFP (grey) stable knockdowns (sh), or R1810A (red) 
versus wild-type (blue) POLR2A. 


on ACTB and GAPDH (Fig. 4c and Extended Data Fig. 6e). When sen- 
ataxin is stably knocked down, RNAP II similarly over-accumulates in 
the termination region of ACTB"® (Fig. 4a), suggesting that senataxin 
and POLR2A R1810 are both needed for termination and release of 
RNAP II from DNA at the pause sites. Consistent with the idea that it 
is the lack of R1810me2s modification that leads to a failure of RNAP II 
release, stable shRNA-mediated or transient siRNA-mediated knock- 
down or CRISPR-mediated knockout of SMN or PRMT5 led to similar 
over-accumulation of RNAP II in termination regions (Fig. 4a, d and 
Extended Data Figs 6b, d, 7 and 8a). We then performed ChIP-seq of 
RNAP IJ for shSMN versus shGFP, and POLR2A (R1810A) versus wild- 
type POLR2A samples. When compared to the control, SMN knock- 
down and the R1810A mutation on the CTD both led to a genome-wide 
RNAP II accumulation in termination regions of active genes (Fig. 4e 
and Extended Data Fig. 9). 

Formation of R-loops by elongating RNAP II over pause sites 
downstream of poly(A) signals and their resolution by the senataxin 
helicase are important for recruiting the 5’—3’ exonuclease XRN2 
and termination by RNAP II!”. The monoclonal antibody $9.6 binds 
RNA-DNA hybrids (Fig. 5a)!*. When we used this antibody for DNA 
immunoprecipitation (DIP), we found that, like depletion of senataxin 
(Fig. 5b), mutation of R1810 (Fig. 5c, d) or knockdown of SMN or 
PRMT5 (Fig. 5b, d) led to an over-accumulation of R-loops in the 
termination region on the ACTB gene. A second method for R-loop 
detection employing a GFP fusion construct that includes the RNase 
H1 R-loop-binding domain (GFP-HB) was also used”. Between 
control and SMN CRISPR-mediated knockout cells that stably express 
GFP-HB, increased R-loop formation was detected in the termination 
regions in SMN knockout cells by GFP ChIP (Extended Data Fig. 8b). 
These experiments indicated that R1810me2s, PRMT5, and SMN lie 
upstream of senataxin in a common pathway important for R-loop 
resolution and transcription termination. 

To determine whether the phenotypes for RNAP II and R-loop 
accumulation in the termination region may be relevant to the spinal 
muscular atrophy (SMA) disease state, fibroblast and B-lymphocyte 
cell lines were obtained for two children with SMA and their unaf- 
fected parents (Extended Data Fig. 8d). RNAP II ChIP and R-loop 
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Figure 5 | SMN and R1810mez2s are important for resolving R-loops 
created by elongating RNAP II and preventing DNA damage. 

a-d, Quantified DNA immunoprecipitation along ACTB using S9.6 antibody 
in HEK293 cells (a, b), with stable knock-down (short hairpin RNA, sh) of 
SMN, PRMT5 or senataxin (b), or treatment with RNase H (a), or in Raji 

cells after mutating R1810 to alanine (c). Error bars denote s.e.m. (n=3 to 

5 biological replicates). e, f, Quantified ACTB gene qPCR data from yH2AX 
ChIP experiments in HEK293 (e) or Raji (f) cells, after knocking down (sh) 
PRMT5 or SMN (e) or mutating R1810 to alanine (f). Error bars denote 
s.e.m. (n= 4 biological replicates). 


DIP showed that, compared to the control (average of the two 
parents), RNAP II and RNase H-sensitive R-loops tend to accumulate 
in the termination region of the ACTB gene in the SMA cells (Extended 
Data Fig. 8e, f). 

Next, we examined why R-loop accumulation in termination regions 
might contribute to the neurodegeneration characteristic of SMA. By 
using RNA-seq to examine the effects of the POLR2A R1810A muta- 
tion, we confirmed that there are few gene expression changes aside 
from the previously noted upregulation of various snoRNAs and 
snRNAs’ (data not shown). The GEMIN-containing SMN complex 
assembles various snRNPs that participate in spliceosome assembly**™, 
and splicing deficiencies may account for SMA in a mouse model”>”’. 
Similarly, the R1810me2s-SMN-SETX pathway may prevent SMA, 
at least in part, by affecting splicing, and more extensive analysis of our 
RNA-seq data revealed that the R1810A mutation causes many splicing 
alterations (data not shown). However, it is not clear whether these 
splicing changes are caused by upregulation of snRNA genes, changes 
in RNAP II elongation kinetics through the R1810me2s-SMN-SETX 
pathway’, or by recruitment by SMN of TDP-43 (also known as 
TARDBP) and FUS, which are known to interact with SMN*?*! and 
affect splicing*?-*°. 

Another possibility is that genome instability caused by less efficient 
removal of R-loops** may contribute to neurodegeneration. To test 
whether 3’-end R-loop accumulation can lead to DNA damage, we 
used a ChIP to assay the effects of the POLR2A R1810A mutation and 
the depletion of SMN or PRMT5 on the presence of yH2AX, which 
accumulates at sites of DNA damage*”” (Fig. 5e, fand Extended Data 
Fig. 10a, b). As expected, we observed accumulation of YH2AX and 
increased YH2AX:H2AX ratio in the ACTB gene termination region 
where RNAP II and R-loops accumulate. Therefore, accumulation of 
termination region R-loops may generally lead to DNA damage and 
genome instability, as is the case when senataxin is depleted*”. 


Discussion 

The finding that R1810 in the RNAP II CTD is symmetrically dimeth- 
ylated by PRMTS5 is substantiated by many observations described here. 
First, immunoprecipitated RNAP II is recognized by two different anti- 
bodies in a western blot, SYMM10 and Y12, specific for Arg-me2s, as 
well as antibodies raised against an R1810me2s peptide and specific for 
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the Arg-me2s modification. Second, the SDS gel mobility of this band 
is the same as that of the largest subunit of RNAP II. Third, recogni- 
tion of the modification by the antibodies depends on R1810. Fourth, 
the modification recognized by the antibodies depends on PRMT5. 
Fifth, POLR2A R1810 and PRMT5 are needed for the association with 
RNAP IL and the recruitment to transcribed regions of SMN, whose 
Tudor domain is specific for Arg-me2s. Symmetric dimethylation of 
R1810 in the POLR2A CTD causes the direct recruitment of SMN and 
indirect recruitment of the RNA-DNA helicase senataxin. This is fol- 
lowed by R-loop resolution and, in termination regions, recruitment 
of the 5’+3’ exonuclease XRN2 and efficient termination by RNAP II 
(see model in Extended Data Fig. 10c). Mutation of R1810 or depletion 
of SMN or PRMT5 may also lead to RNAP II accumulation in promoter- 
proximal regions, perhaps because some RNAP II molecules that pause 
downstream of promoters”! terminate prematurely in a process that 
depends on R1810me2s and SMN. 

POLR2A R1810 is both symmetrically dimethylated by PRMT5 and 
asymmetrically dimethylated by CARM1 (ref. 7), but it is not uncom- 
mon for type I and type II PRMTs to compete for deposition of me2a 
and me2s, respectively, on the same substrates**. Other examples of 
alternative modifications of the same Arg residue are PRMT5 antago- 
nizing PRMT1-dependent activation of the histone H4R3me2a mark 
by depositing repressive me2s marks on histones H4R3 and H3R8 
(refs 16, 40, 41), and alternative modifications of R698 of the elongation 
factor SPT5 by PRMT1 and PRMTS5 to regulate its role in RNAP II 
elongation”. 

SMN, which recognizes R1810me2s, can self-aggregate through 
its N-terminal K-rich domain and C-terminal YG box, amplifying its 
potential for Arg-me2s-mediated protein-protein interactions***“. 
Although the human genome contains two SMN genes (SMNI1, SMN2), 
mutating one copy of SMN1 decreases total SMN protein enough to 
cause SMA, with disease severity reflecting insufficiency of the remain- 
ing SMN for its full range of functions”*. 

SMN oligomers facilitate snRNP assembly by binding Arg-me2s 
modifications on Sm proteins***4, Many RNAP II elongation and 
termination factors and proteins implicated in neurodegeneration also 
contain dimethylated Arg, including XRN2, three subunits of CPSF 
(CPSF1, CPSF5, and CPSF6), CSTF2, three poly(A)-binding pro- 
teins (PABP1, PABP 2, and PABP4), RBBP6, WDR33, PCF11, SPT5, 
CTDP1, DHX9, FUS, and EWSR1***”. It is possible, therefore, that 
SMN may help assemble an R-loop resolving complex on the RNAP 
II CTD by binding several Arg-me2s-modified termination factors. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


The number of times each of the experiments described in this work was performed 
and the raw western blots used in the figures are shown in the Supplementary 
Information file. The investigators were not blinded to allocation during experi- 
ments and outcome assessment. 

In vitro PRMT5 methyltransferase assays. In vitro methylation assays (3011) con- 
taining 100,1M of biotinylated 13-mer CTD peptides centred around R1810 or 
R1603, 0.1-2 14M PRMT5-WDR77 complex, 100M tritiated SAM (PerkinElmer, 
catalogue number NET155V250UC), 20mM Tris-HCl pH8, 0.01% TritonX-100 
(Sigma catalogue number T8532), and 5mM DTT were incubated at room tempera- 
ture for 2h. The biotinylated peptides were then precipitated with 20 1l streptavidin- 
agarose (Invitrogen catalogue number SA100), washed with buffer (above), mixed 
with 5 ml ScintiVerse BD Cocktail (Fisher Chemical, catalogue number SX18-4), 
and counted using a Beckman Liquid Scintillation Counter (LS 6500). Other 
similar reactions contained recombinant GST proteins (GST alone, GST-N-CTD 
or GST-C-CTD), 0.1-2}1M PRMT5-WDR77 and 100.M tritiated SAM. These 
reactions were precipitated with glutathione beads (Invitrogen catalogue number 
G2879), and the beads were washed and eluted with 500,11 of 20mM 1-glutathione 
(pH 8) for counting. 

Peptides, GST recombinant proteins, recombinant Tudor domains, fluores- 
cence polarization and isothermal titration calorimetry assays. FITC- and 
biotin-labelled CTD peptides containing R1603 (SPAYEPRSPGGYT) and 
R1810 (YSPSSPRYTPQSP) were prepared on a Prelude peptide synthesizer 
(Protein Technologies, Tucson Arizona) using Fmoc (9-fluorenyl methoxycar- 
bonyl) solid-phase chemistry. Dimethyl arginine derivatives were prepared using 
Fmoc-SDMA(Boc)2-ONa or Fmoc-ADMA (Pbf)-OH reagents (Novabiochem, 
Germany). Peptides were purified using C18 reverse-phase HPLC and authenti- 
cated using mass spectrometry. 

Constructs for GST recombinant protein expression (GST-N-CTD: contains 
repeats 1-29; or GST-C-CTD: contains repeats 24-52) were expressed in BL21 
bacteria and purified following the standard glutathione bead purification pro- 
tocol. Bacterial expression constructs and purified Tudor domains from TDRD3, 
SMN, SPF30, TDRD1, TDRD2, TDRD9 and TDRD11 (also known as SND 1) were 
described previously”*. 

Fluorescence polarization assays were carried out as described before**. The 
buffer used in the fluorescence polarization assay was 20 mM Tris pH 7.5, 50 mM 
NaCl, 1 mM DTT and 0.01% Triton X-100. An excitation wavelength of 485 nm 
and an emission wavelength of 528 nm were used. The data were obtained at 25°C 
and corrected by subtracting the label-free peptide background. The data were 
collected by the Synergy 2 (BioTec, USA) fluorescence polarization program and 
were fitted to a one-site binding model using Origin 7 (MicroCal). The Ky values 
are from the average of three measurements. 

For isothermal titration calorimetry, the concentrated protein was diluted in 

20mM Tris, pH 7.5, 150mM NaCl. The lyophilized peptides were dissolved in the 
same buffer and pH was adjusted by adding NaOH. Peptide concentrations were 
estimated from the molecular weight. All the measurements were performed at 
25°C, using a VP-ITC microcalorimeter. Protein with a concentration of 501M 
was placed in the cell chamber, and the peptides with a concentration of 1 mM 
in syringe were injected in 25 successive injections with a spacing of 180s anda 
reference power of 13 mcal s !. Data were fitted using the single-site binding model 
within the Origin software package (MicroCal). 
Cell culture, shRNA knockdowns, siRNA knockdowns, CRISPR knockouts, 
and electroporation. There was no evidence of mycoplasma contamination of 
the cell lines used in this work as judged by staining of fixed cells with DAPI. Raji 
cells were cultured in RPMI (SLRI media facility) plus 10% FBS (Sigma catalogue 
number F1051) and 1% glutamate, and stably transduced cells were maintained 
with 500 1g ml“! G418 (Gibco catalogue number 11811031). HEK293 cells were 
grown in DMEM (SLRI media facility) plus FBS (Sigma catalogue number F1051), 
and stably transduced cell lines were maintained with 2\.g ml! puromycin (Sigma 
catalogue number p8833). shRNAs in lentivirus vectors were used to stably trans- 
duce cell lines using an established protocol’. siRNA knockdowns for HEK293 
cells were performed with 50 nM SMARTpool siRNAs with PepMute siRNA trans- 
fection reagent (SigmaGen Laboratory catalogue number SL100566) for 3 days. 
SMARTpool On-Target plus siRNAs against human PRMT5 (catalogue number 
L-015817) and SMN (catalogue number L-011108) were purchased from Thermo 
Scientific. 

For CRISPR-mediated gene knockouts, CRISPR/Cas9 plasmids (pCMV-Cas9- 
GFP) were purchased from Sigma-Aldrich which express scrambled guide RNA, 
or guide RNA that targets the SMN1 gene. 2 1g of the plasmids were transfected 
into HEK293 cells, and 1 day after transfection, cells were sorted by BD FACSAria 
flow cytometry (Donnelly Centre, University of Toronto) and single GFP-positive 
cells were plated into a 48-well plate. The expression levels of SMN in each clone 
were detected by western blotting. 


Raji cells with stable expression of HA-tagged wild type or POLR2A (R1810A) 

constructs were generated by electroporation (101g of plasmid DNA per 10’ cells), 
followed by selection and maintenance with G418 (0.5 mg ml). «-amanitin 
treatment was carried out with 24g ml! a-amanitin for 3 days for Co-IP and ChIP 
experiments involving HA-tagged wild-type or POLR2A (R1810A). The transfec- 
tion of the GFP-HB transgene for R-loop detection into HEK293 cell lines was per- 
formed with the FuGENE Transfection reagent (Roche, catalogue number E269A). 
Disease cell lines. SMA disease relevant and control fibroblast and B-lymphocyte 
cell lines were obtained from the Coriell Institute (Family 553: GM03813, 
GM03814, GM03815; Family 3042: GM23686, GM23687, GM23688), and were 
grown in conditions as instructed by the Coriell Institute. The cells were collected 
and fixed for RNAP II ChIP and R-loop DIP. 
Immunoprecipitation (IP) and western blots. IP was performed with RIPA 
buffer (140 mM NaCl, 10mM Tris pH 7.6-8.0, 1% Triton, 0.1% sodium deoxy- 
cholate, 1mM EDTA) containing protease inhibitors (Roche catalogue number 
05892791001) and benzonase (Sigma E1014). 10” to 2 x 107 cells were lysed on 
ice for 25 min by vortexing and forcing them through a 27 gauge needle. After 
centrifuging at 13,000 r.p.m. for 10 min at 4°C, the supernatant was incubated 
with 2511 (1:10 dilution) of protein G beads (Invitrogen catalogue numbers 
10-1243 and10003D) and 1-2 :g of antibodies for 4h to overnight. The samples 
were washed 3 times with RIPA buffer and boiled in SDS gel sample buffer. To 
detect R1810me2s or R1810me2a modifications on POLR2A, alkaline phosphatase 
(Roche catalogue number 10108138001) treatment (5 11) at 37°C for 30 min was 
performed for POLR2A immunoprecipitated samples before boiling. Samples were 
run using 7.5-10% SDS-PAGE and transferred to PVDF membranes (Bio-Rad 
catalogue number 162-0177) using a trans-blot semi-dry electrophoretic transfer 
Cell (BioRad catalogue number 170-3940). Primary antibodies were used at 
1:250 to 1:1,000 dilutions for incubation overnight, and horseradish peroxidase- 
conjugated goat anti-mouse, anti -rabbit, or anti-rat secondary antibodies were 
used at 1:10,000 (Dako catalogue number P0450). Blots were developed using 
SuperSignal West Pico or Femto (Thermo Scientific catalogue numbers 34079 and 
34094). Blots were quantified using Image] software. 

A Hoefer slot blot system (Fisher Scientific catalogue number 11509543) was 

used to assay R1810me2s antibody specificities following the manufacturer’s 
protocol. 
Chromatin immunoprecipitation (ChIP) and DNA immunoprecipitation 
(DIP). ChIP was performed using the EZ-ChIP A chromatin immunoprecipitation 
kit (Millipore catalogue number 17-371) or similar homemade solutions according 
to the manufacturer's instructions. Antibodies were used in the 1-2 1g range, and 
IgG was used as a background control. DIP was performed according to ref. 50 
with minor modifications. DIP was performed following the ChIP protocol except 
that, after the nuclear lysis and sonication, genomic DNA was de-crosslinked in 
ChIP elution buffer containing 5M NaCl at 65°C overnight. DNA was purified 
with the Qiaex II kit (Qiagen catalogue number 20021) for PCR product purifica- 
tion and eluted in water. DIP was carried out overnight with 25 11 of Dynabeads 
protein G beads (Invitrogen catalogue number 100-03D) and 1 j1g of antibody 
purified from the $9.6 hybridoma cell line*! that recognizes RNA-DNA hybrids. 
Immunoprecipitated and input DNAs were used as templates for qPCR. DIP 
RNase-sensitivity analysis was carried out by adding 50 U of RNase H (Invitrogen 
catalogue number 18021-014) in 10x RNase H buffer (75 mM KCI, 50mM Tris 
pH 8.3, 3mM MgCh, 10mM DTT) with 4% glycerol and 20j1g ml BSA before 
immunoprecipitation. The RNase H treatment was performed for 2h at 37°C. 

For comparing POLR2A ChIP and $9.6 DIP signals on the ACTB gene, wild- 
type or control knockdown signals were normalized to 1, and the R1810A mutant 
or knockdown samples were adjusted such that the ratio for the intron 3 (1671) 
position was set to 1. Similarly, for the GAPDH gene, the ratio for the intron 5 
(2436) position was set to 1. ChIP data for senataxin and SMN were expressed 
as ratios to the ChIP data for POLR2A. Error bars represent biological replicates, 
except where indicated otherwise. 

Nuclear run-ons (NRO). NRO was performed according to Skourti-Stathaki et al. 
with modifications!*”!. Approximately 107 cells were incubated on ice in swelling 
buffer (10 mM Tris-Cl pH 7.5, 2mM MgCls, 3mM CaCl) for 5 min, and were 
pelleted. Pellets were resuspended in 1 ml lysis buffer (swelling buffer containing 
0.5% NP4O0, 10% glycerol, and 2U ml“! RNaseOUT (Invitrogen catalogue num- 
ber 10777-019)) and pipetted for lysis, followed by centrifugation. The pellet was 
resuspended in 1 ml freezing buffer (50 mM Tris-Cl pH 8.3, 40% glycerol, 5mM 
MgCl, 0.1 mM EDTA). Reactions contained 10011 resuspended nuclei, 100 1l 
reaction buffer (40 mM Tris pH 7.9, 300mM KCl, 10mM MgCl, 40% glycerol, 
2mM DTT), 500|.M rNTPs (ATP, CTP, GTP) (GE catalogue number 27-2025-01), 
including 125}1.M UTP asa negative control or Br-UTP (Invitrogen catalogue num- 
ber B21551) for 30 min at 30°C. 311 BrdU antibody (Sigma catalogue number 
B8434) was pre-conjugated to 301] Dynabeads protein G beads with 10j1g tRNA 
(Invitrogen catalogue number115401) as block in 100-RSB buffer (10 mM Tris 
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pH 7.4, 100 mM NaCl, 2.5mM MgCh, 0.4% Triton X-100) for 2h at 4°C. The RNA 
was extracted using TRIzol (Invitrogen catalogue number 15596-026) and was heat 
fragmented at 95°C for 8 min. RNA was then mixed with beads and BrdU antibody 
for 2h at 4°C in 5001 100-RSB buffer, 100 U ml“! RNase OUT, 400 U ml"! DNase I 
(Invitrogen catalogue number 18047-019). Immunoprecipitated RNA was washed 
three times with 100-RSB buffer. 

Primer information. Primers used in ChIP, DIP, and NRO are listed here 

For the ACTB gene: —72.£w CCGAAAGTTGCCTTTTATGGC, -72.rev 
CAAAGGCGAGGCTCTGTGC; 332.fw CGGGGTCTTTGTCTGAGC, 332.rev 
CAGTTAGCGCCCAAAGGAG; 1671.fw TAACACTGGCTCGTGTGACAA, 
1671.rev AAGTGCAAAGAACACGGCTAA; 2661.fw GGAGCTGTCACAT 
CCAGGGTC, 2661.rev T@CCACTGGCTCGTGTGACAA; 2911.fw TGCGCA 
GAAAACAAGATGAG, 291 1.rev GTCACCTTCACCGTTCCAGT; 3560.fw 
TTACCCAGAGTGCAGGTGTG, 3560.rev CCCCAATAAGCAGGAACAGA; 
3752.fw GGGACTATTTGGGGGTGTCT, 3752.rev TCCCATAGGTGAA 
GGCAAAG; 4657.fw TGGGCCACTTAATCATTCAAC, 4657.rey CCTCACTTC 
CAGACTGACAGCG; 5590.fw CAGTGGTGTGGTGTGATCTTG, 5590.rev GGC 
AAAACCCTGTATCTGTGA. 

For the GAPDH gene: 55.fw CTCCTGTTCGACAGTCAGC, 55.rev 

TTCAGGCCGTCCCTAGG; 1407.fw CACCCTGGTCTGAGGTTAAATATAG, 
1407.rev GIGGGAGCACAGGTAAGT; 2436.fw ATAGGCGAGATCCCTCCAA, 
2436.rev TGAAGACGCCAGTGGAC; 3882.fw CCCTGTGCTCAACCAGT, 
3882.rev CTCACCTTGACACAAGCC; 4511.fw AGATGTGTCAGGG 
TGACTTAT, 4511.rev TAGGTCCCAGCTACACGC; 5196.fw GICTCAGTGTAT 
GACAGACAGG, 5196.rev TGTATGTGCGCTCAGGG. 
Chromatin immunoprecipitation and sequencing analysis (ChIP-seq). 
Chromatin immunoprecipitation was performed as before™. In brief, 10” to 10° 
cells were crosslinked for 10 min in 1% formaldehyde. Lysates were sonicated 
to a DNA fragment length range of 200-300 bp using a Bioruptor (Diagenode). 
RNAP II was immunoprecipitated with 21g of antibodies and Dynabeads Protein 
G (Invitrogen). Subsequently, crosslinks were reversed at 65°C overnight and 
bound DNA fragments were purified (EZ-10 spin column PCR product purifi- 
cation kit, Bio Basic). Sequencing libraries were constructed using the TruSeq 
ChIP sample prep kit (Illumina) according the manufacturer’s instructions. 
Libraries were sequenced (single-end reads) on the Illumina HiSeq 2500 to a 
minimum depth of 30 million reads, obtaining at least 10 million unique reads per 
sample. 

ChIP-seq analysis was performed chiefly as before**. For ChIP-seq, reads 
in FASTQ format were mapped to the human genome (hg19) using Bowtie 2 
(ref. 55) with local alignment, duplicate reads were removed, and reads were 
extended to 300 bp. Pileups—the number of fragments overlapping each genomic 
bp—were calculated, and were normalized by million mappable reads in the ChIP- 
seq library. Normalized pileups from different replicates were then averaged to 
create FPM (fragments per million reads). Data for ChIP-seq analyses have been 
deposited in GEO with the accession code GSE73379. 

RNA extraction and gPCR. RNA was used for cDNA synthesis with the 
SuperScript VILO Kit (Invitrogen catalogue number 11754). PCR was performed 
using the Phusion-high fidelity PCR kit (Thermo Scientific catalogue number 
F-553S), and qPCR was performed with Fast SYBR Green Master qPCR mix, using 
the Applied Biosystems 7300 real time PCR System (catalogue number 4406984). 
qPCR consisted of 40 cycles of 95°C for 15s and 55°C for 30s, and a final cycle 
(95°C for 15s and then 60°C) generated a dissociation curve. Input DNA or RNA 
reverse transcribed into cDNA were used to calculate the per cent enrichment in 
the immunoprecipitated samples. 

Antibodies, constructs and reagents. Anti-CTD R1810me2s antibody was 
raised in rabbits using a KLH-conjugated CTD peptide from POLR2A (amino 
acids 1806-1813) that carried an R1810me2s modification. KLH conjugation was 
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performed using an N-terminal cysteine residue (Cedarlane). R1810me2s-specific 
antibodies were enriched by flowing the serum through a column containing an 
R1810me0 peptide conjugated to SulfoLink Coupling Resin (Thermo Scientific 
catalogue number 20401). 

GST fusion constructs containing CTD N-terminal repeats 1-29 and C-terminal 
repeats 33-52 were provided by J. Manley®®. Flp-in T-REx GFP-HB fusion con- 
struct that contains the R-loop binding domain of RNase H was provided by 
A. Aguilera”. The ORFs for POLR2D, TDRD3, PRMTS, and SMN came from 
the plasmid collection at Harvard. CMV promoter-driven Flag-tagged TDRD3, 
GFP, and PRMT5 constructs for HEK293 cell culture were generated using the 
MAPLE system as previously described’’. «-amanitin-resistant wild-type and 
R1810A mutant POLR2A constructs were kindly provided by D. Eick’. RNAP II 
R1810 me2a antibody was provided by D. Reinberg’. We obtained the POLR2A 
pSer2 and pSer5 antibodies from the Eick laboratory (S2P: 3E10; S5P: 3E8). We 
obtained the $9.6 antibody for R-loop DIP from S. Leppla. 8WG16 monoclonal 
antibody against unphosphorylated CTD repeats of POLR2A was prepared in our 
laboratory. 

Commercial antibodies were as follows: Y12 (Abcam monoclonal antibody cat- 
alogue number ab3138); Sym10 (Millipore polyclonal antibody catalogue num- 
ber 07-412); Asym24 (Millipore polyclonal antibody catalogue number 07-414); 
TDRD3 (Santa Cruz polyclonal antibody catalogue number C-20); HA (Sigma 
monoclonal antibody catalogue number H9658); PRMT5 (Upstate polyclonal anti- 
body catalogue number C7-405, Santa Cruz monoclonal antibody catalogue num- 
ber sc-22132); SMN (Santa Cruz polyclonal antibody catalogue number H-195); 
SETX (for ChIP and IP (Novus Biologicals polyclonal antibody catalogue number 
NB100-57543) and for western blots (Bethyl Lab polyclonal antibody catalogue 
number A301-104A)); XRN2 (Santa Cruz polyclonal antibody catalogue num- 
ber sc-99237); POLR2A N20 (Santa Cruz polyclonal antibody catalogue number 
sc-899); POLR2A 4H8 (Abcam monoclonal antibody catalogue number ab5408); 
POLR2A H224 (Santa Cruz polyclonal antibody catalogue number sc9001); 
\H2AX (Millipore, catalogue number 05-636); H2AX (Millipore, catalogue 
number 07-627); tubulin (Sigma monoclonal antibody catalogue number T8328); 
Flag (Sigma monoclonal antibody catalogue number F1804); GFP (Abcam 
polyclonal antibody catalogue number 290); IgG negative controls for ChIP and 
IP (Millipore polyclonal antibody catalogue number 12-370). a-amanitin was 
purchased from Sigma (catalogue number 23109-05-9). 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | The R1810me2s and R1810me2a 
modifications on POLR2A depend on PRMT5 and CARMI, 
respectively. a, POLR2A carries Rme2a and Rme2s modifications. 
Whole-cell lysates (WCL) from HEK293 cells stably expressing 
Flag-tagged TDRD3 or the POLR2D subunit of RNAP II were used 
for immunoprecipitation using beads conjugated with M2 anti-Flag 
antibody, and the precipitates were western blotted with the indicated 
antibodies. Cells expressing Flag—GFP were used as a negative control. 
Precipitated TDRD3 and POLR2D contained POLR2A with the Arg-me2a 
modification (ASYMM24 antibody), whereas precipitated POLR2D, 
and not TDRD3, contained POLR2A with the Arg-me2s modification 
(SYMM10 and Y12 antibodies). b, Whole-cell lysate western blot 
controls for Figure 1b. c, Y12 and R1810me2s recognition of RNAP II 
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CTD R1810me2s is blocked by surrounding phosphorylated residues. 
The detection of R1810me2s improves for both antibodies when the 
precipitated samples are treated with alkaline phosphatase. d, Slot 
blots illustrating that the Y12 and R1810me2s antibodies specifically 
recognize peptides containing RNAP II R1810me2s. The indicated 
amounts of biotin-labelled 7mer CTD peptides bracketing R1810 with no 
modification, Arg-me2a, and Arg-me2s were blotted before incubating 
with the R1810me2s or Y12 antibodies. e, Western blot confirming 
efficient PRMT5 knockdown, and RT-qPCR assay confirming efficient 
CARM1 knockdown for experiment of Fig. 1c. f, g, Whole-cell lysate 
western blot controls for Fig. 1c,d, respectively. h-j, Whole cell lysate 
western blot controls for Fig. 3c-e, respectively. 
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Extended Data Figure 2 | Recognition of R1810me2s by SMN. 

a-c, In vitro fluorescence polarization (FP) peptide binding assays. 

a, Recombinant SMN Tudor domain was incubated with FITC- 
labelled 13-mer CTD peptides bracketing either R1810 or R1603. SMN 
preferentially bound CTD peptides in the order R1810me2s > R1603me2s 


Y1P, S2P or both upstream of R1810me2s, showing slightly preferential 
binding to the peptides when the phospho-modification(s) are present. 
c, FITC-CTD R1810me2s or FITC-CTD R1810mez2a is not recognized 
by other recombinant Tudor domains from SMNDCI1 (also known as 
SPF30), TDRD1, TDRD2, TDRD9 or TDRD11 (also known as SND1). 


~ R1810me2a > R1603me2a, and exhibited no detectable affinity for the 
unmodified peptides. b, Recombinant SMN Tudor domain was incubated 
with FITC-labelled CTD peptides bracketing R1810me2s also containing 


d, Isothermal titration calorimetry assays showing that the recombinant 
SMN Tudor domain has no enhanced binding to R1810me2s containing 
peptides also carrying S2P or both Y1P and S2P. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Interaction of SMN and senataxin with 
RNAP II depends on PRMT5. a, PRMT5 overexpression increases 

the R1810me2s modification and the SMN and senataxin associations 
with RNAP II. Left, western blots with the indicated antibodies of 
HEK293 whole-cell lysates overexpressing Flag-tagged PRMT5 or GFP. 
Overexpressing PRMT5 does not increase the amount of SMN 

or senataxin. Right, endogenous POLR2A was immunoprecipitated 

(N20 antibody) from HEK293 cell lysates with overexpressed Flag-tagged 
PRMT5 or GFP. b, SMN associates with phospho-isoforms of RNAP II. 
Immunoprecipitation with SMN antibody, but not control IgG, 
co-precipitated RNAP II with unmodified CTD repeats (8WG16 antibody) 
and CTD repeats phosphorylated on Ser5 and Ser2 as detected by western 


blotting. c, Requirement of PRMT5 for association of SMN and senataxin 
with RNAP II. Left, western blots with the indicated antibodies of 
HEK293 whole-cell lysates expressing siRNAs for PRMT5 or SMN. Right, 
endogenous POLR2A was immunoprecipitated (N20 antibody) from cells 
with transient siRNA-mediated knockdown of PRMT5 or SMN. Western 
blots were performed with the indicated antibodies. PRMT5 knockdown 
causes loss of R1810me2s on POLR2A, as well as reduced association 

of SMN and senataxin with RNAP II. SMN knockdown causes reduced 
association of senataxin with RNAP II. d, SMN ChIP-seq (with GFP 
antibody against inducible GFP-SMN, or with SMN-specific antibody in 
HEK293 cells). Both methods observe enriched SMN signals at promoter 
and termination regions; the average of both is shown. 
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Extended Data Figure 4 | Association of SMN and senataxin 

with GAPDH depends on R1810 and PRMTS5. a, Chromatin 
immunoprecipitation (ChIP) was used to determine the distribution of 
SMN along the human GAPDH gene in HEK293 cells, expressed as percent 
input or as a ratio of SMN to RNAP IL Error bars represent technical 
variation in a single experiment (mean + s.e.m., n =3). b, SMN ChIP was 
performed in HEK293 cells stably expressing shRNAs for PRMT5 or GFP 
(as a control). With the control normalized to 1, knockdown of PRMT5 
caused strong reductions of the SMN ChIP signals all along GAPDH. 
Error bars represent biological replicates (mean + s.e.m., n= 3). c, SMN 
ChIP signals on GAPDH decrease in Raji cells expressing HA-tagged 
R1810A mutant POLR2A after 3 days of treatment with a-amanitin to 
eliminate endogenous POLR2A. ChIP results with wild-type HA-tagged 
POLR2A were normalized to 1. Error bars represent biological replicates 


(mean +s.e.m., n= 3). d, ChIP in HEK293 cells showing the distribution 
of senataxin along the human GAPDH gene, expressed as percent input 

or as a ratio to RNAP IL. Error bars represent technical variation in a 
single experiment (mean + s.e.m., m= 3). e, Senataxin ChIP signals 

on GAPDH decrease in the POLR2A (R1810A) mutant after 3 days of 
treatment of Raji cells with a-amanitin to eliminate endogenous POLR2A. 
Results are normalized to wild-type POLR2A and expressed as the ratio 
of senataxin to RNAP IL. Error bars represent biological replicates (mean 
+ s.e.m., 1 =3). f, Knockdown of PRMT5 or SMN causes reductions of 
the senataxin ChIP signals all along GAPDH. ChIP against senataxin 

was performed in HEK293 cells with shRNA-mediated knock-down of 
PRMT5 or SMN. Results were normalized to a control knockdown of GFP 
and expressed as the ratio of senataxin to RNAP II. Error bars represent 
biological replicates (mean + s.e.m., n= 2). 
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Extended Data Figure 5 | The R1810 mutation causes RNAP II Raji cells expressing HA-tagged wild-type or mutant (R1810A) POLR2A, 
to accumulate in the termination region of ACTB. Chromatin 3 days after treatment with a-amanitin to eliminate endogenous POLR2A. 
immunoprecipitation (ChIP) with three different POLR2A antibodies Shown are single experiments with error bars representing technical 
(8WG16, H224, 4H8), as indicated, was performed on the ACTB gene in variation (mean + s.e.m., n= 3). 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | R1810, PRMT5, and SMN regulate 
transcription termination on GAPDH. a, Chromatin 
immunoprecipitation (ChIP) with the N20, 4H8, and 8WG16 antibodies 
to show the distribution of wild-type POLR2A along the human GAPDH 
gene. Error bars represent technical variation (mean + s.e.m., n= 3). 

b, POLR2A ChIP along the GAPDH gene was performed in HEK293 cells 
after stable knockdown of PRMT5 or SMN, using stable knockdown of 
GFP as a negative control. RNAP II over-accumulates at the termination 
sites on GAPDH after knockdown of PRMT5 or SMN. Error bars represent 
biological replicates (mean + s.e.m., n=5). ¢, POLR2A ChIP on the 
GAPDH gene was performed in Raji cells that express HA-tagged wild- 
type or mutant (R1810A) POLR2A 3 days after «-amanitin treatment to 
eliminate endogenous POLR2A. R1810A mutant RNAP IJ over-accumulates 
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downstream of the cleavage and polyadenylation sites where RNAP II 
pauses and terminates transcription on GAPDH. Error bars represent 
biological replicates (mean + s.e.m., n= 4). d, Data from b, c are displayed 
as normalized ratios to the control (GFP knockdown, or HA wild-type 
POLR2A), with the ratio for the intron 5 qPCR primers at 2436 set as 1. 

e, Nuclear run-on experiment in which nuclei from Raji cells expressing 
wild-type or mutant (R1810A) POLR2A 3 days after «-amanitin treatment 
to eliminate endogenous POLR2A were incubated with BrUTP for 30 min, 
and short run-on RNAs were isolated by binding to anti-BrU antibodies. 
The R1810A mutation led to over-accumulation of active RNAP II in the 
region downstream of the poly(A) site on GAPDH. Error bars represent 
technical variation (mean + s.e.m., n = 3). 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | PRMT5 and SMN but not CARMI regulate 
transcription termination by RNAP II on ACTB and GAPDH. 

a, Chromatin immunoprecipitation (ChIP) for POLR2A with 4H8 
antibody was performed after stable shRNA-mediated PRMT5, CARM1 
or SMN knockdown in HEK293 cells to show that only PRMT5 and SMN 
knockdowns lead to the over-accumulation of RNAP II in the termination 
regions of ACTB. The graph shows a single experiment with error bars 
representing technical variation (mean + s.e.m., n= 3). b, POLR2A ChIP 
with the 8SWG16 antibody was performed after transient siRNA-mediated 
knockdown of PRMT5 or SMN in HEK293 cells to show that PRMT5 

or SMN knockdown leads to the over-accumulation of RNAP II in the 
termination region of 3-actin. The graph shows a single experiment with 
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error bars representing technical variation (mean + s.e.m., n= 3). c, ChIP 
for POLR2A with 4H8 antibody was performed after stable shaRNA- 
mediated knockdown of PRMT5, SMN or GFP (as a control) in HEK293 
cells to show that knockdown of PRMT5 or SMN leads to the over- 
accumulation of RNAP II in the termination region of GAPDH. The graph 
shows a single experiment with error bars representing technical variation 
(mean + s.e.m., n=3). d, POLR2A ChIP with 8WG16 antibody was 
performed after transient siRNA-mediated knockdown of PRMT5 or SMN 
in HEK293 cells to show that knockdown of PRMT5 or SMN leads to the 
over-accumulation of RNAP II in the termination region of GAPDH. The 
graph shows a single experiment with error bars representing technical 
variation (mean + s.e.m., 1 = 3). 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | RNAP II pausing defect and R-loop 
accumulation are observed in CRISPR SMN knockout cells and in SMA 
disease cells. a, ChIP for POLR2A with N20 antibody was performed 
after stable SMN knockout (CRISPR) in HEK293 cells shows that RNAP 
II accumulates in the termination regions of ACTB. Scrambled guide 
RNA treatment was used as a negative control. Error bars represent 
biological variation (mean + s.e.m., n = 3). b, Accumulation of R-loops in 
the termination regions of the ACTB gene after SMN knockout. A fusion 
protein of GFP-RNase H DNA-RNA hybrid binding domain was stably 
expressed in HEK293 cells. ChIP with GFP antibody (Abcam 290) was 
used for the detection of the R-loops (DNA-RNA hybrids), using the 
indicated primer positions for qPCR along the gene. Scrambled guide 
RNA treatment was used as a negative control. Error bars represent 
biological variation (mean + s.e.m., m= 3). ¢, Top: live cell microscopy 
images showing that HEK293 cells with SMN knockout appear to be 
physiologically normal in comparison to the control scrambled KO. 
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Bottom: western blot with anti-SMN antibody showing that SMN 
expression is knocked out by CRISPR. d, Human cell lines (3 fibroblast, 

3 B lymphocyte) were obtained from the Coriell Institute for Medical 
Research. These include cells from two children with SMA disease and 
their normal parents. e, ChIP for POLR2A (N20, 8WG16 antibodies) 

was performed on the ACTB gene using the averaged value of the parents 
as control. POLR2A in the SMA disease cells (from both fibroblast and 

B cell lines) accumulates in the termination regions of the ACTB gene. 
Error bars represent biological variation (mean + s.e.m., n= 4). f, Top: 
quantification of R-loops by DNA immunoprecipitation (DIP) with the 
S9.6 antibody in the patient cells, showing that the R-loops are sensitive to 
RNase H. Error bars represent technical variation (mean + s.e.m., n= 3). 
Bottom: R-loop DIP with the $9.6 antibody shows that R-loops accumulate 
in the termination regions of the ACTB gene in the SMA disease cells. The 
averaged value of the parents was used as a control. Error bars represent 
biological variation (mean + s.e.m.,n=5). 
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Extended Data Figure 9 | SMN and POLR2A(R1810) effects on ATF4 (a short gene), JUND (an intronless gene), and TUBA1B) using the 


transcription termination by RNAP II occur on a genome-wide level. Integrative Genomics Viewer. The promoter peaks are displayed to the 
a, Western blot using N20 antibody for POLR2A and SMN antibody to left, and the regions underlined in red are RNAP IJ termination regions. 


verify shSMN knockdown of SMN for the ChIP-seq experiment of Fig. 4e; | IgG ChIPseq was used as negative control. Approximately 10 million 
shGFP was used as a control. b, Western blot using N20 and HA antibodies unique RNAP II ChIPseq reads (4H8, 8WG16) were obtained from GFP 
to verify equal expression of HA-tagged wild-type and POLR2A(R1810A) or SMN stable knockdown Raji cells. Approximately 10-12 million unique 
and the effect of a-amanitin treatment on cells without an HA-tagged RNAP II ChIP-seq reads (N20) were obtained from wild-type or R1810A 
construct for the ChIP-seq experiment of Fig. 4e. c, RNAP II ChIP-seq POLR2A Raji cells upon 3 days of amanitin treatment (21g ml). 

results for several housekeeping genes are displayed in detail (B2M, CD40, 
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Repetitive patterns in rapid optical variations in the 
nearby black-hole binary V404 Cygni 
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How black holes accrete surrounding matter is a fundamental yet 
unsolved question in astrophysics. It is generally believed that 
matter is absorbed into black holes via accretion disks, the state of 
which depends primarily on the mass-accretion rate. When this 
rate approaches the critical rate (the Eddington limit), thermal 
instability is supposed to occur in the inner disk, causing repetitive 
patterns of large-amplitude X-ray variability (oscillations) on 
timescales of minutes to hours!. In fact, such oscillations have been 
observed only in sources with a high mass-accretion rate, such as 
GRS 1915+105 (refs 2, 3). These large-amplitude, relatively slow 
timescale, phenomena are thought to have physical origins distinct 
from those of X-ray or optical variations with small amplitudes and 
fast timescales (less than about 10 seconds) often observed in other 
black-hole binaries—for example, XTE J1118+480 (ref. 4) and GX 
339—4 (ref. 5). Here we report an extensive multi-colour optical 
photometric data set of V404 Cygni, an X-ray transient source® 
containing a black hole of nine solar masses’ (and a companion star) 
at a distance of 2.4 kiloparsecs (ref. 8). Our data show that optical 
oscillations on timescales of 100 seconds to 2.5 hours can occur at 
mass-accretion rates more than ten times lower than previously 
thought’. This suggests that the accretion rate is not the critical 
parameter for inducing inner-disk instabilities. Instead, we propose 


that a long orbital period is a key condition for these large-amplitude 
oscillations, because the outer part of the large disk in binaries with 
long orbital periods will have surface densities too low to maintain 
sustained mass accretion to the inner part of the disk. The lack of 
sustained accretion—not the actual rate—would then be the critical 
factor causing large-amplitude oscillations in long-period systems. 

V404 Cyg, which was originally discovered as a nova in 1938 and 
detected by the GINGA satellite in 1989°, underwent an outburst in 
June 2015 after 26 years of dormancy. At 18:31:38 on June 15 (15.77197 
Universal Time (uT)), Swift/Burst Alert Telescope (BAT) initially 
detected this outburst as a possible +-ray burst’®. The outburst was 
also detected by the Monitor of All-sky X-ray Image (MAXI) instru- 
ment on June 16.783 ut! 

Following these detections, we started a world-wide photometric 
campaign (Extended Data Tables 1 and 2, Methods section “Detailed 
methods of optical observations’) partly within the Variable Star 
Network (VSNET) Collaboration, and collected extensive sets of mul- 
ti-colour optical photometric data consisting of >85,000 points. Our 
data set also includes early observations with the Taiwanese-American 
Occultation Survey (TAOS) starting on June 15, 18:34:07 ut, 2 min 29s 
after the Swift/BAT trigger’? (see Extended Data Tables 1 and 2, and 
Methods section “Detailed methods of optical observations’ concerning 
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Figure 1 | Overall multi-colour light curves during the 2015 outburst of 
V404 Cyg. Shown are multi-colour light curves (B, V, R and I bands, and 
no filters) during BJD 2,457,189 to 2,457,202 (BJD 2,457,189 corresponds 
to 2015 June 15). It is clearly seen that dip-type oscillations (variations 
with recurrent sudden dips) were observed from the beginning to the 


the VSNET collaboration team and TAOS). Some weak activity started 
approximately 1,000 before the Swift/BAT trigger’*. The same activ- 
ity above 80 keV was also detected by the active anti-coincidence 
shield (ACS) of the Spectrometer on INTEGRAL (SPI) telescope of 
the INTEGRAL observatory in the same time intervals (P. Minaev, 
personal communication). 

Our observations immediately indicated that large-amplitude short- 
term variations on timescales of ~100s to ~2.5h were already present, 
starting less than three minutes after the Swift/BAT trigger. In Fig. 1 
and Extended Data Fig. 1, we show the overall optical multi-colour light 
curves. The overall trend of the light curves can be divided into three 
stages: (1) gradual rise during BJD (Barycentric Julian Day) 2,457,189 
to 2,457,194.5 (brightening by 1 magd 7! on average); (2) the plateau 
during BJD 2,457,194.5 to 2,457,200.0; and (3) rapid fading during BJD 
2,457,200.0 to 2,457,203.3 (fading on average by 2.5 mag d-'). Short- 
term variations with amplitudes varying between 0.1 mag and 2.5 mag 
were observed throughout the outburst, and consisted of characteristic 
structures such as recurrent sudden dips from a peak (Fig. 1). 

Moreover, fluctuations similar in shape to the unique X-ray varia- 
tions of the enigmatic black-hole binary GRS 1915 + 105” are present 
in the optical light curve of V404 Cyg (Fig. 2). The patterns in the X-ray 
light curve of GRS 1915+105 have been classified into at least 12 cate- 
gories on the basis of their flux and colour characteristics*. Repeating 
structures like these had not been observed in optical wavelengths 
before the 2015 outburst of V404 Cyg. The variations that we observed 
can be divided into two characteristic classes: (1) ‘dip-type’ oscillations 
(repetitions of a gradual rise followed by a sudden dip, sometimes with 
accompanying spikes on timescales of ~45 min to ~2.5h; Fig. 2a-c); 
and (2) ‘heartbeat-type’ oscillations (rhythmic small spikes with short 


end of the outburst. The horizontal axis shows BJD — 2,457,189. The 
significant periods of repetitive optical variations are indicated in grey 
and green shading for the ‘dip-type’ and ‘heartbeat-type’ oscillations, 
respectively. 


periods of ~5 min; Fig. 2d). Although rapid optical variations have 
been detected in the black-hole binary V4641 Sgr, those variations are 
stochastic with no indication of regular patterns'*. The variations we 
found in V404 Cyg at optical wavelengths were regular and similar in 
shape to those in GRS 1915 + 105, although the interval between dips 
is about 5 times larger in V404 Cyg than in GRS 1915 + 105. 

Using X-ray data from Swift/X-ray Telescope (XRT), we compared 
simultaneous optical and X-ray light curves (Fig. 3). When both X-ray 
and optical data showed strong short-term variations, the temporal 
correlations were generally good, although the X-ray flux variations 
are much larger than the optical ones. The good correlation indicates 
that both X-ray and optical observations recorded the same phenom- 
ena (see also Methods section ‘Comparison with X-ray observations’ 
and Extended Data Fig. 2). Spectral analyses of the simultaneous 
X-ray data (Methods section ‘Origin of cyclic dips’ and Extended Data 
Fig. 3) indicate that there was no tendency for increased absorption 
when the X-ray flux decreased, suggesting that these dips do not orig- 
inate in absorption. In some epochs, we found evidence for heavy 
obscuration as found in the GINGA data during the 1989 outburst’®; 
however this is not related to dip-type variations. We can thus infer 
that the short-term fluctuations directly reflect variations in radiation 
from the accretion disk or its associated structures. Detailed analyses 
of the typical simultaneous broad-band spectral energy distribution 
(SED) (Methods section ‘SED modelling’ and Extended Data Fig. 6) 
show that the majority of the optical flux is most likely to be produced 
by reprocessing of X-ray irradiation in the disk. 

For GRS 1915+ 105, it has been proposed that the observed variabil- 
ity is caused by limit-cycle oscillations in the inner accretion disk due to 
Lightman-Eardley viscous instability'®, which can explain a slow rise in 
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Figure 2 | Short-term and large-amplitude optical variations having 
repeating structures in the 2015 outburst of V404 Cyg. a-d, Variations 
with characteristic patterns during BJD 2,457,193.6 to 2,457,194.0 

(a), BJD 2,457,197.7 to 2,457,198.0 (b), BJD 2,457,198.6 to 2,457,198.9 
(c) and BJD 2,457,200.34 to 2,457,200.6 (d). Ina, b and ¢, there are 
gradual rises with increasing amplitudes of fluctuations followed by dips, 


brightness (mass accumulation) followed by a sudden drop (accretion 
to the black hole). Such a model assumes that the black hole is accreting 
mass nearly at the Eddington rate, which is supported by observations 
of GRS 1915+ 105!”. Similar types of X-ray variability have also been 
detected in the black-hole binary IGR J17091—3624 (ref. 18), whose 
Eddington rate is unknown because both the mass and the distance 
are uncertain. 


@ Swift/XRT 0.5-10 keV e@ | band 


during which fluctuations disappear. These variations are sometimes 
accompanied by spikes. The interval between two dips ranges from 

~45 min to ~2.5h. d, Repetitive small oscillations with high coherence 
at intervals of ~5 min. The shapes of these oscillations resemble those of 
GRS 1915+105°. 


In V404 Cyg, however, the accurate determination of the distance 
based on a parallax measurement® and the dynamical mass determi- 
nation’ enable us to conclude from our 2015 data that the black hole in 
this system was accreting at a much lower rate than the Eddington rate 
most of the time. During the period when GRS 1915 + 105-type vari- 
ations in the optical light curves were recorded in V404 Cyg, its bolo- 
metric luminosity, averaged over an interval longer than the period of 
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Figure 3 | Correlation between optical and X-ray fluctuations of 

V404 Cyg in the 2015 outburst. The times covered in each panel are BJD 
2,457,194.126 to 2,457,194.140 (a), BJD 2,457,197.050 to 2,457,197.065 
(b), BJD 2,457,198.760 to 2,457,198.780 (c) and BJD 2,457,199.430 to 
2,457,199.450 (d). In each panel, the left-hand y axis shows magnitude in 
bands I, R, V and B, and the right-hand y axis shows counts per second 
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in the Swift/XRT 0.5-10 keV band. Panels a and b cover the fading and 
rising phases, respectively; panels c and d show the correlations of short- 
term fluctuations. When both X-ray and optical light strongly varied, the 
correlation is generally good (though note in a, c and d that optical dips 
lag behind X-ray dips). Navy blue error bars, +10. We plot points without 
errors when errors are smaller than or comparable to the plotting symbols. 
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Figure 4 | The bolometric luminosity 1,1 of V404 Cyg during the 2015 
outburst. It is normalized to the Eddington luminosity assuming a black 
hole mass of 9M. Black points, Swift/BAT survey data (15-50 keV); red 

points, from the public Target Opportunity release of INTEGRAL Imager 


oscillation, spanned a wide range, from ~0.01 Legg to ~0.4 Lega (where 
Lgaa is the Eddington luminosity for a nine solar-mass, Mo, black hole), 
as estimated from the hard X-ray flux and SED (Fig. 4 and Methods 
section “Time history of the bolometric luminosity’). Remarkably, the 
dip-type oscillations were observed at mean bolometric luminosity of 
~0.015 Leda, 0.07 Leag and ~0.06 Legg during BJD 2,457,191.35 to 
2,457,191.60, BJD 2,457,192.34 to 2,457,192.70, and BJD 2,457,200.60 
to 2,457,200.76, respectively. 

It is also worth noting that a typical dip similar to those seen in 
GRS 1915+105 was observed just 3 min after the first detection of this 
outburst (Extended Data Fig. 1b). This fact suggests that the accretion 
rate is not the critical parameter for inducing these oscillations. Our 
results imply that there is a novel type of disk instability that is different 
from the known dwarf-nova type’? or the Lightman-Eardley type’®. 

We point out that black-hole binaries showing large-amplitude, 
short-term variations either in X-ray or optical bands have long orbital 
periods (33.9din GRS 1915+105”, ~4d in IGRJ17091—36247!, 6.5d 
in V404 Cyg”, and 2.8d in V4641 Sgr”; see Methods section ‘Objects 
showing violent short-term variations in outburst’ and Extended Data 
Table 3 for a comparison of these objects), reinforcing the earlier 
suggestion of this link between violent oscillations and long orbital 
periods”. It has been proposed that the accretion disk in a sys- 
tem with a long orbital period suffers from instabilities in the disk’s 
vertical structure, and hence the disk beyond this radius of insta- 
bility may never build up!*”°. Our SED modelling of this outburst, 
however, requires a disk having a large radius (>1.7 x 10'* cm), 
even considering the uncertainty of the interstellar reddening, to 
account for the ultraviolet flux particularly. This result implies that 
the disk extended up to distances close to the maximum achievable 


on Board the Integral Satellite (IBIS)/CdTe array (ISGRI) monitoring 
(25-60 keV). Grey and green shadings represent respectively the periods 
of the ‘dip-type oscillations’ and the ‘heartbeat-type oscillations. Black and 
red error bars, tlo. 


radius (Methods section ‘SED modelling’). This radius is consist- 
ent with the short-term optical variations detected below 0.01 Hz 
(Extended Data Fig. 5 and Methods section “Power spectra’) and 
the time lag of ~1 min between the X-ray and optical light curves 
(Fig. 3 and Extended Data Fig. 2) if we assume that the optical light 
mainly comes from reprocessed X-rays. We note that synchrotron emis- 
sion has been proposed to be the origin of the short-term and large-am- 
plitude fluctuations in the case of V4641 Sgr’. The optical polarization 
of V404 Cyg, however, did not show evidence of significant variations 
during the 2015 outburst”®”’. This fact disfavours synchrotron emission 
as the origin of the short-term variations. 

Outbursts of X-ray transients are thought to be triggered by the 
dwarf-nova-type instability: once the surface density at some radius 
reaches the critical density (2i,) after continuous mass transfer from 
the secondary star, thermal instability occurs and the disk undergoes an 
outburst!®. In systems with long orbital periods, it is difficult for surface 
densities in the outer disk to reach Yuit, which is roughly proportional 
to the radius®. As a result, thermal instability in the inner part of the 
disk occurs more easily and governs the outburst behaviour”’. This is 
probably the reason why long-period systems behave differently from 
short-period ‘classical’ X-ray transients. In fact, our estimate of the disk 
mass (5 x 10”° g) accreted during the 2015 outburst is far smaller than 
the mass (2 x 108g) of a fully built-up disk in quiescence (Methods 
section ‘Estimation of the disk mass and comparison with previous 
outbursts’ and Extended Data Fig. 4). These values indicate that the 
surface density was well below the Yi required to induce thermal 
instability in most parts of the disk at the onset of the present outburst. 
Once the X-ray outburst started in the inner region, hydrogen atoms 
in the outer part of the disk would have been ionized and ‘passively’ 
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maintained in the hot state as long as the X-ray illumination contin- 
ued. This explains the large optical fluxes observed”*. The rapid decay 
observed in the 2015 outburst of V404 Cyg may reflect the lack of the 
exponential decay in long-period systems as theoretically predicted*®. 
Because the surface densities in the rest of the disk were too low to sus- 
tain the outburst by viscous diffusion”, only the inner part of the disk 
was responsible for the dynamics of the present outburst, as inferred 
from the rapid fading from the outburst (Methods section “Disk radius 
inferred from final fading rate’). We infer that, in outbursts of IGR 
J17091—3624'*! and the 1938 outburst of V404 Cyg (Methods section 
‘Estimation of the disk mass and comparison with previous outbursts’ 
and Extended Data Fig. 4), the radius of the active disk is larger, which 
explains why the duration of those events is longer than that of the 2015 
outburst of V404 Cyg. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Detailed methods of optical observations. Immediately after the detection by 
Swift/BAT on June 15.77197 ut, the VSNET collaboration team”! started a world- 
wide photometric campaign of V404 Cyg. There was also an independent detection 
by CCD (charge coupled device) photometry on June 16.169 uT*”. Time-resolved 
CCD photometry was carried out at 27 sites using 36 telescopes with apertures of 
dozens of centimetres (Extended Data Table 2). We also used the public AAVSO 
data**. We corrected for bias and flat-fielding in the usual manner, and performed 
standard aperture photometry. The observers, except for TAOS™, used standard 
filters (B, V, Rc; Ic; we write R and I for Rc and Ic in the main text and figures for 
brevity) and measured magnitudes of V404 Cyg relative to local comparison stars 
whose magnitudes were measured by A. Henden (sequence 15167RN) from the 
AAVSO Variable Star Database*>. We applied small zero-point corrections to some 
observers measurements. When filtered observations were unavailable, we used 
unfiltered data to construct the light curve. The exposure times were mostly 2-30s, 
with some exceptional cases of 120s in B band, giving typical time resolution of a 
few seconds. All of the observation times were converted to BJD. 

Comparison with X-ray observations. For the Swift/XRT light curves 
(Fig. 3 and Extended Data Fig. 2), we extracted source events from a region with a 
30-pixel radius centred on V404 Cyg. To avoid pile-up effects, we further excluded 
an inner circular region if the maximum count rate of the XRT raw light curves, 
binned in 10s intervals, exceeded 200 countss~!. The inner radii are set to be 10 
and 20 pixels at the maximum raw rate of 1,000 counts s! and 2,000 counts s~!, 
respectively, and those for intermediate count rates were determined via linear 
interpolation between the two points. The presented light curves were corrected 
for photon losses due to this exclusion by using the xrtlccorr tool. In addition, from 
Fig. 3a, c and d, we can see a time delay in the start of a dip in optical light, relative 
to that in X-rays. The delay time was ~1 min, which is similar to the reported value 
of 0-50s (ref. 36). This was determined by cross-correlating the U-band and X-ray 
(0.3-10 keV) light curves obtained with Swift/UltraViolet and Optical Telescope 
(UVOT) and Swift/XRT on ur 2015 June 21°°. The observations were carried out 
when the source showed little rapid optical flickering and no extreme flares, and 
thus the nature of the lag may be different from that in our observations. We also 
note that the apparent difference between the Swift/UVOT and the ground-based 
times” is caused by the drift of the clock on board the satellite, to which we have 
applied the necessary corrections. 

Origin of cyclic dips. In order to examine the possibility that absorption by gas 
in the line-of-sight causes the observed violent flux variations in the optical and 
X-ray bands (Fig. 3), we studied intensity-sliced X-ray spectra. A striking exam- 
ple is shown in Extended Data Fig. 3a. The period shown corresponds to that in 
Fig. 3a when both the X-ray and optical fluxes exhibited a sudden intensity drop 
towards the latter part of the period. We divided it into five intervals (T1 to T5; 
Extended Data Fig. 3a), and generated spectra through the tools xrtpipeline and 
xrtproducts in standard pipeline processing. We excluded the central 60-arcsec- 
ond strip from this Windowed Timing (WT) mode data, to avoid the heavy 
pile-up effect when the raw count rate exceeds ~150 countss-'. We compared the 
vF, spectra of the five intervals, where the spectra are fitted by a single power-law 
model multiplied by photoelectric absorption (phabs x pegpwrlw; in the standard 
X-ray spectral fitting package XSPEC). The absorbed X-ray flux ranges by two 
orders of magnitude, from 2.1 x 10-°ergs-!cm~ in T5 to 3.0 x 10-ergs ‘cm? 
in T3. However, the best-fit column density and photon index were relatively 
stable over the five intervals, ~(2-6) x 10-7! cm~? and ~1.0-1.5, respectively. 
Since the X-ray spectrum does not show a noticeable rise in column density 
when the X-ray flux sharply dropped, and since there is no stronger iron edge 
in the latter part of the observation, absorption cannot be the primary cause 
of the time variation in our data sets that cover the X-ray and optical bands 
simultaneously. 

Objects showing violent short-term variations in outburst. In Extended Data 
Table 3 we show the list of X-ray binaries that have shown violent short-term 
variations either in X-rays or in optical wavelengths. 

IGR J17091—3624 is known as the second black hole X-ray binary whose X-ray 
light curves showed a variety of patterns, resembling those of GRS 1915 + 105!%. 
The variations observed in the 2011 outburst of this object were classified as 
p Cheartbeat’), v (similar to class p but with secondary peak after the dips), 
a (‘rounded-bumps’), (3/A (repetitive short-term oscillations after low-quiet 
period) and ju (ref. 18). 

The Rapid Burster (RB or MXB 1730—335), a low-mass X-ray binary (LMXB) 
containing a neutron star (NS), was discovered by Small Astronomy Satellite 
(SAS-3) observations*”. This object has been recently reported to show cyclic long 
X-ray bursts with periods of a few seconds resembling class p (‘heartbeat’) vari- 
ations and those with periods of 100-200 s resembling class @ (“M”-shaped light 
curves) variations of GRS 1915+ 10574. The emission of the Rapid Burster did not 
reach the Eddington luminosity during these variations**. 
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V4641 Sgr was originally discovered as a variable star” and was long confused 
with a different variable star, GM Sgr*®. V4641 Sgr is famous for its short and 
bright outburst in 1999, which reached a optical magnitude of at least 8.8 mag 
(refs 41-44). V4641 Sgr showed short-term variations in optical wavelengths 
during the 2002, 2003 and 2004 outbursts!*4°*”, It was the first case in which 
short-term and large-amplitude variations in the optical range during an out- 
burst were detected. V4641 Sgr is classified as a LMXB, and has a long orbital 
period. Its mass-accretion rate is less than the Eddington rate (except for the 1999 
outburst“), These properties are similar to those of V404 Cyg. However, while 
the short-term variations of V4641 Sgr seemed to be random, those of V404 Cyg 
showed repetitive patterns; this is the greatest difference between these two objects. 
There has been a suggestion that V4641 Sgr is a ‘microblazar”? because the jets 
observed during the outburst in 1999 were proposed to have the largest bulk 
Lorentz factor among known galactic sources”, 

There are also other X-ray transients showing short-term optical variations (for 

example, XTE J1118+480 and GX 339—4). However, these two sources are quasi- 
periodic oscillations (QPOs), characterized by very short periods. The periods 
are much shorter than those of repetitive patterns (tens of seconds to a few hours) 
that we discuss in this Letter. Furthermore, the amplitudes of their variations are 
significantly smaller than those observed in V4641 Sgr*°? on timescales longer 
than tens of seconds. 
Estimation of the disk mass and comparison with the previous outbursts. 
Following the method in ref. 15, we estimated the mass stored in the disk at the 
onset of the outburst. By integrating the X-ray light curve of Swift/BAT and assum- 
ing the spectral model C in table 1 in ref. 15, we obtained a value of 5.0 x 10° g 
assuming a radiative efficiency of 10% and a distance of 2.4+ 0.2 kpc (ref. 8). 
The mass during the 1989 outburst has been updated to 3.0 x 10” g by using this 
updated distance. The stored mass in the 2015 outburst was approximately the 
same as that in the 1989 one. As discussed in ref. 15, these masses are far smaller 
than the mass of a fully built-up disk, estimated to be 2.0 x 1078 g, if these outbursts 
were starting at the outermost region. 

We compare the published optical light curves of the 1989 and 1938 out- 
bursts*+°? with our data from the 2015 outburst (Extended Data Fig. 4). We can 
see that these outbursts have different durations. The 1938 outburst was apparently 
longer than the others, and it may have had different properties from the 1989 
and 2015 ones. The fading rates of the 1989 and 2015 outbursts are significantly 
larger than those of classical X-ray transients, or of FRED (fast rise and exponen- 
tial decline)-type outbursts, such as 0.028 magd~! in V518 Per= GRO J0422+32 
(ref. 53) and 0.015 magd! in V616 Mon = A0620—00 (ref. 54). This supports the 
hypothesis that the outbursts in 1989 and 2015 are different from typical outbursts 
of classical X-ray transients and that the stored disk mass was a factor of ~10° 
smaller in the 1989 and 2015 outbursts than the mass of a fully built up disk. 
Power spectra. We performed power spectral analyses on BJD 2,457,193, BJD 
2,457,196 and BJD 2,457,200. We used the continuous and regularly sampled 
high-cadence data set obtained by LCO (Extended Data Table 1) with exposure 
times of 5s (on BJD 2,457,193) and 2s (others). The durations of these observations 
are 1.4, 3.1 and 2.2h, respectively. Considering the read-out times of 1 s, the Nyquist 
frequencies of these observations are 0.08 and 0.17 Hz, respectively. The power 
spectral densities (PSDs) were calculated using powspec software in the FTOOLS 
Xronos package on magnitude measurements. We did not apply de-trending of 
the light curve since the durations of the individual observations were shorter 
than the timescale of the global variation of the outburst. The power spectra are 
well expressed by a power law (Pac f~/) with an index of 1.9+0.1, 1.8+0.1, 
and 2.3+ 0.1 on BJD 2,457,193, 2,457,196 and 2,457,200, respectively (Extended 
Data Fig. 5). Interpretation of the physical origins on the basis of these variations 
is difficult, because a power law index of ~2 in the PSDs is often observed in 
natural phenomena. In this region (f< 0.01 Hz), the power originating in the opti- 
cal variations of V404 Cyg is significantly higher than that of white noise estimated 
from the observations. 

We next summarize the other reports on short-term variations of V404 Cyg 
during the present outburst. On BJD 2,457,191, this object was observed using the 
Argos photometer on the 2.1m Otto Struve Telescope at McDonald Observatory 
with an exposure time of 2 s**. They reported that the power spectrum was domi- 
nated by steep red noise. Observations on BJD 2,457,193 and BJD 2,457,194 were 
also performed using the ULTRACAM attached with the 4.2m William Herschel 
Telescope on La Palma observatory with a high time resolution (466.8 ms)°°, They 
reported that the variations were dominated by timescales longer than tens of 
seconds. Although large amplitude flares (0.3-0.4 mag) on timescales shorter than 
1s were reported”, these flares may be of different origin. For the variations with 
timescales longer than 100 s, our results agree with these reports*?”®. 

Disk radius inferred from final fading rate. The timescale 7 of heating/cooling 
waves in dwarf novae and X-ray transients* is a function of the mass of the central 
object (M;) and radius (r) with the form T x aM a 23/2, where ais the viscosity 
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parameter”. Here, we estimate the disk radius of V404 Cyg assuming that the 
timescale of the final fading reflected a dwarf nova-type cooling wave. Using the 
Kepler data of V344 Lyr and V1504 Cyg, we measured a fading rate of 1.5magd_! 
of the normal outbursts immediately preceding superoutbursts. During the out- 
bursts in V344 Lyr and V1504 Cyg™, the disk radius is expected to be very close 
to the 3:1 resonance radius. Adopting a typical mass of a white dwarf in a cataclys- 
mic variable (M; =0.83M.,,; ref. 61), we estimated the disk radius of V404 Cyg to 
be 7.8 x 10!°cm for a black hole mass of 9M.5. This is much smaller than the radius 
(1.2 x 10'?cm) expected for a fully built-up disk». 

SED modelling. Extended Data Fig. 6a shows the multi-wavelength SED on BJD 
2,457,199.431 to 2,457,199.446, when the source was simultaneously observed in 
the X-ray, ultraviolet (UV) and optical bands. The optical fluxes in the V and Ic 
bands are taken from our photometric data averaged over the period. Note that 
Rc-band data are also available but not used here, because of the contamination 
by the continuum strong Ha line“. 

The X-ray spectrum is extracted from simultaneous Swift/XRT data (ObsID 
00031403058) which were taken in the WT mode. The data are processed through 
the pipeline processing tool xrtpipeline. The events detected within 20 pixels 
around the source position are removed to mitigate pile-up effects. The U-band 
flux is obtained from the Swift/UVOT images with the same ObsID as the XRT, 
through the standard tool uvot2pha provided by the Swift team. A circular region 
centred at the source position with a radius of 5 arcsec is adopted as the source 
extraction region of the UVOT data. The optical, UV and X-ray data are corrected 
for interstellar extinction/absorption by assuming Ay (interstellar extinction in the 
V band) = 4 (ref. 65) and using the extinction curve in ref. 66 and the Ny (hydro- 
gen column density) versus E(B—V) relation in ref. 67. Radio data are from the 
RATAN-600 observation performed in the same period”. 

The multi-wavelength SED can be reproduced with the diskir model®”°, which 
accounts for the emission from the accretion disk, including the effects of 
Comptonization in the inner disk and reprocessing in the outer disk. We find that 
partial covering X-ray absorption (using the pcfabs model implemented in the 
spectral analysis software XSPEC) improves the quality of the fit significantly. The 
inner-disk temperature is estimated to be 0.12 + 0.01 keV, and the electron tem- 
perature and photon index of the Comptonization component, the ratio between 
the luminosity of the Compton tail and disk blackbody (Lc/La), and the fraction 
of the bolometric flux thermalized in the outer disk (four), are 17.5 + 0.8 keV, 
1.78 + 0.03, 1.17 £0.03, and 1.3798 x 10-2, respectively (the errors in this 
section represent 90% confidence ranges for one parameter). The inner radius 
(Rin) is estimated to be (1.5-5.4) x 108 cm, and the outer radius (Rout) is 
(2.50.3) x 10!2cm. The derived value of Rout is comparable to or even larger 
than the binary separation (~2.2 x 10’? cm). However, it could be smaller due to 
uncertainties in interstellar/circumbinary extinction”! and/or the contribution of 
jet emission. For instance, if Ay is 0.4 mag larger than the assumed value (4.0), Rout 
becomes (1.9+0.2) x 10!2cm. The maximum achievable radius of a stable disk for 
aq (mass ratio) = 0.06 object (Extended Data Table 3) is around 0.62A (radius of 
the 2:1 resonance) to ~0.7A (tidal limit), where A is the binary separation’. 
Considering the uncertainties, the result of our analysis (= 0.77A) is compatible 
with this maximum radius. Our result appears to favour a large Ay value. For the 
partial covering absorber, the best-fit value of the column density is 
5.2*9'3 x 1073 cm” and that of the covering fraction is 64+4%. 

The radio SED can be approximated by a power-law with a photon index of 
~1, as in other black hole binaries in the low/hard state’*. This profile is likely to 
be generated by the optically-thick synchrotron emission from compact jets”. 
Because an optically-thick synchrotron spectrum often extends up to the millime- 
tre to near-infrared bands’*-”’, it may contribute to the optical fluxes, in particular 
at longer wavelengths. The blackbody emission from the companion, a K3III-type 
star” with a radius of ~3 Rs and a temperature of ~4,320K, contributes to the 
SED negligibly. 

Extended Data Figure 6b plots the simultaneous SED on BJD 2,457,191.519 to 
2,457,191.524, which is ~2 orders of magnitude fainter in the X-ray band than that 
shown in the left panel. The X-ray, UV and optical data are taken from the Swift 
data (ObsID 00031403038) and our photometric measurements in the same man- 
ner as described above. This SED can be reproduced with the irradiated disk model 
as well, with somewhat smaller photon index (1.43")'93) and inner-disk tempera- 
ture (<0.07 keV), anda larger f,,,, (0.06")'}) than those on BJD 2,457,199.431 to 
2,457,199.446. 

Time history of the bolometric luminosity. The bolometric luminosity Lyo1 of 
V404 Cyg is evaluated based on the hard X-rays above ~15 keV where the intrinsic 
spectrum is less affected by an absorption. 

We processed the Swift/BAT archival survey data via batsurvey in the HEAsoft 
package to derive count rates with individual exposures of ~300s. Even within this 
short exposure, photon statistics are good during bright states (>0.05 countss~'). 
Assuming a Crab-like spectrum (1 Crab 0.039 countss~'), the BAT count rates R 


(counts s~') are then converted into 15-50keV flux (F15-so) and luminosity (L15_50) 
using F155) =3.6 x 10-7R (ergs !cm~') anda fiducial distance of 2.4 kpc, respec- 
tively. In Fig. 4, we show Lpoi after multiplying by a conversion factor Lpoi/Lis-so=7 
determined from SED modelling (previous section). We find that this bolo- 
metric correction factor lies within the range 2.5-10 by fitting 19 X-ray(XRT)- 
optical simultaneous SED in different periods between BJD 2,457,192.019 and 
2,457,201.011. Since the BAT survey data are rather sparse, in order to catch shorter- 
term variations, we further overlaid the INTEGRAL IBIS/ISGRI monitoring in the 
25-60 keV band available at ref. 78, assuming a conversion parameter of 1 Crab 
rate to be 172.1 countss~! and a bolometric correction factor of Lyoi/L25_69 = 9.97. 
The luminosity was highly variable during the outburst, changing by five orders 
of magnitude. While V404 Cyg sometimes reaches the Eddington luminosity (Lgaa) 
at the peak of multiple sporadic flares, it also repeatedly dropped below 1-10% of 
Lraa (Fig. 4). At earlier phases of this outburst, the characteristic oscillation already 
occurred during a lower luminosity state, as discussed in the main text. 
Sample size. No statistical methods were used to predetermine sample size. 
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Extended Data Figure 1 | Optical and X-ray light curves of V404 Cyg magnitude dropped below V= 17.0. Superimposed on this rapid fading, 


during an outburst in 2015 June-July. a, Overall multi-colour light 
curves and Swift/BAT light curves. The plotted points are averaged 
for every 0.67 days. b, An enlarged view of the shaded box in a (the 


the amplitude of variations became progressively smaller and smaller. 
After BJD 2,457,205, the mean magnitude seemed to be constant, and the 
outburst virtually ended. 


first detection of short-term variations). On BJD 2,457,203, the mean 
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Extended Data Figure 2 | Additional examples of simultaneous optical and X-ray observations of V404 Cyg in the 2015 outburst. Data shown in 
Fig. 3 are excluded. a, b, Main panels, correlations on BJD 2,457,192 (a) and BJD 2,457,200 (b); right panels, Swift/XRT light curves on linear scales. 
Navy blue error bars, +1o. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 11 T2 T3 T4 TS b 


i=) 
i=) 
Wt 
S = ' H ey Hamat i 
© te cae “i iia 
So { i ie 
7 
‘o 
S & 
- 2 Or 
ne : 
€ 2 
8 é 
gq Oo 
8 g 
3 
fo) 
’ S 
7,400 7,600 7,800 8,000 8,200 “0.6 1 2 5 10 
Time (s) Energy (keV) 
Extended Data Figure 3 | Example of the soft X-ray light curve and the X-ray data in Fig. 3a. b, Time-sliced soft X-ray spectra with pile-up 
spectra during the dip-type oscillation in the 2015 outburst of V404 correction, in the intervals of T1 to T5 determined in a. The exposures of 
Cyg. a, The ~860-s-long Swift/XRT raw light curve (BJD 2,457,194.125- individual spectra are ~100-300s. Error bars, +1o. 


2,457,194.135, ObsID 00031403040) without pile-up correction, same as 
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Extended Data Figure 4 | Comparison of the 1938, 1989 and 2015 outbursts of V404 Cyg. The horizontal axis represents days BJD — 2,429,186, 
BJD — 2,447,673 and BJD — 2,457,189, respectively. Photographic magnitudes are approximately the same as B band. 
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Extended Data Table 1 | A log of photometric observations of the 2015 outburst of V404 Cyg 


Start* End* Mag’ Error’ Ni Obs Band* Start* End* Mag’ _ Error’ N' Obs Band‘ 
0.274 0.295 13.24 0.032 215 TAO R 7.314 7.511 12.54 0.065 67 CRI Vv 
0.282 0.499 15.34 0.101 37 PZN CR 7.314 7.512 11.45 0.060 66 CRI Re 
0.386 0.426 15.18 0.036 86 PZN Re 7.315 7.511 10.36 0.061 65 CRI Io 
0.386 0.426 14.31 0.040 20 CRI&PZN Re 7.422 7.588 10.52 0.011 1501 IMi Ic 
1.137 1.192 14.92 0.024 61 PZN CR 7427 7.670 11.35 0.035 1104 deM Io 
1.274 1.398 13.93 0.086 20 PZN Re 7.675 7.802 10.81 0.012 1961 LCO Ie 
1.283 1.284 11.40 0.000 2 KW2 Io 7.707 7.945 10.70 0.019 1003 swi Ie 
1.283 1.284 12.50 0.000 2 KW2 Vv 7.744 7.907 12.98 0.036 350 GFB Vv 
1.283 1.284 911.95 0.068 2 KW2 B 8.030 8.300 10.22 0.020 535 KU1 Ic 
1.551 1.670 13.48 0.029 191 deM Io 8.032 8.035 11.69 0.036 5 OKU Vv 
1.627 1.810 14.98 0.008 2430 LCO CR 8.038 8.297 10.27 0.016 1022 OKU Io 
2.109 2.517 15.07 0.024 224 PZN CR 8.038 8.128 12.03 0.040 81 loh CR 
2.277 2.404 14.57 0.104 35 PZN Re 8.152 8.214 12.41 0.028 103 Wnm cG 
2.341 2.522 14.19 0.044 231 DPV CR 8.360 8.543 9.95 0.015 68 CRI Ie 
2.354 2.529 15.41 0.056 158 DPV Vv 8.394 8.619 10.41 0.011 623 Kai Ic 
2.354 2.529 14.12 0.049 158 DPV Re 8.419 8.671 10.17 0.012 1129 deM Ie 
2.380 2.505 17.19 0.075 61 Ter B 8.709 8.859 13.42 0.036 413 RIT Vv 
2.380 2.505 15.52 0.072 61 Ter Vv 8.969 9.043 10.87 0.019 296 Sac Ic 
2.381 2.506 14.33 0.062 61 Ter Re 8.993 9.154 10.55 0.024 608 Kis Ic 
2.406 2.524 14.65 0.024 354 Ter CR 9.006 9.044 12.77 0.032 40 Sac Vv 
2.422 2.615 14.43 0.045 151 Kai Io 9.179 9.315 12.49 0.053 146 PZN CR 
2.423 2.609 14.43 0.045 147 Kai Re 9.224 9.229 12.59 0.149 5 OKU Vv 
2.446 2.669 13.46 0.021 667 deM Io 9.239 9.300 10.84 0.020 152 OKU Io 
2.742 2.859 13.91 0.008 2652 LCO CR 9.382 9.620 10.40 0.002 643 Kai Io 
3.801 3.341 12.55 0.048 1216 TAO R 9.414 9.595 10.24 0.003 428 NDJ Ic 
3.251 3.524 16.10 0.054 186 Ter B 9.577 9.841 13.54 0.020 620 RIT B 
3.252 3.525 14.45 0.051 183 Ter Vv 9.607 9.798 12.21 0.005 4709 LCO Vv 
3.252 3.524 13.41 0.044 177 Ter Re 9.635 9.828 10.13 0.010 1823 LCO Io 
3.260 3.529 13.58 0.017 1278 Ter CR 9.744 9.911 12.38 0.031 350 GFB Vv 
3.266 3.308 13.54 0.086 48 PZN CR 10.027 10.028 11.90 0.018 3 Kis Vv 
3.271 3.307 13.64 0.091 40 PZN Re 10.029 10.201 10.54 0.011 837 Kis Ic 
3.410 3.489 15.80 0.095 38 CRI B 10.387 10.619 10.46 0.020 611 Kai Ic 
3.411 3.488 14.36 0.071 37 CRI Vv 10.415 10.670 10.38 0.013 1389 deM Ic 
3.411 3.488 13.17 0.062 37 CRI Re 10.744 10.910 11.99 0.010 349 GFB Vv 
3.411 3.489 12.01 0.058 37 CRI Io 11.182 11.300 9.41 0.012 99 KU1 Ie 
3.419 3.588 14.48 0.048 189 RPc Vv 11.291 11.298 10.55 0.003 112 TAO R 
3.428 3.553 14.52 0.056 128 Trt Vv 11.339 11.514 10.51 0.018 406 DPV Ie 
3.430 3.519 12.25 0.023 597 IMi Io 11.348 11.554 13.10 0.014 730 Trt Vv 
3.435 3.673 12.47 0.020 1036 deM Ig 11.372 11.515 13.15 0.019 335 DPV Vv 
3.525 3.650 12.64 0.075 37 coo Io 11.385 11.592 11.00 0.015 490 Kai Io 
3.530 3.820 14.53 0.076 165 Kis Vv 11.421 11.673 11.32 0.021 1314 deM Ie 
3.819 3.821 10.39 0.014 2 Kis Io 11.460 11.624 13.71 0.097 70 JSa Vv 
3.998 4.057 11.49 0.038 149 KU1 Io 11.483 11.603 13.53 0.016 374 RJV Vv 
4.059 4.311 12.04 0.036 397 Mdy Re 11.590 11.679 14.43 0.014 730 LCO Vv 
4.187 4.316 11.88 0.022 169 TAO R 11.679 11.834 12.95 0.008 3859 LCO Ie 
4.435 4.673 11.66 0.021 1089 deM Ie 12.228 12.232 15.49 0.139 5 OKU Vv 
4.546 4.649 13.41 0.041 82 Kis Vv 12.234 12.271 12.95 0.028 177. TAO R 
4.579 4.637 11.32 0.008 1416 LCO Ig 12.302 12.334 13.81 0.011 311 TAO R 
4.976 4.978 12.14 0.034 5 Kis Vv 12.386 12.611 13.89 0.025 484 Kai Ic 
4.979 4.981 10.04 0.004 3 Kis Io 12.405 12.670 14.08 0.022 640 deM Ic 
5.070 5.223 11.19 0.042 254 Mdy Re 12.484 12.599 16.87 0.031 237 RJV Vv 
5.426 5.481 12.75 0.054 36 CRI B 13.058 13.314 15.81 0.013 211 Mdy Re 
5.427 5.481 13.96 0.057 36 CRI Vv 13.199 13.334 15.97 0.048 1772 TAO R 
5.427 5.480 11.66 0.048 35 CRI Re 13.382 13.594 14.28 0.014 467 Kai Io 
5.427 5.480 10.57 0.044 36 CRI Io 13.415 13.670 14.08 0.022 640 deM Io 
5.448 5.633 10.45 0.010 840 deM Io 13.438 13.473 13.91 0.040 93 NDJ Ic 
5.595 5.670 10.25 0.020 25 coo Io 14.014 14.021 17.21 0.075 5 OKU Vv 
5.724 5.954 9.85 0.007 920 swi Io 14.026 14.168 14.75 0.022 97 OKU Io 
5.745 5.911 11.69 0.011 346 GFB Vv 14.043 14.276 14.70 0.016 361 Kis Io 
5.923 5.949 10.32 0.013 24 coo Ig 14.379 14.499 17.00 0.030 52 Trt CV 
6.011 6.015 12.51 0.016 5 OKU Vv 14.421 14.565 14.85 0.012 152 RPc Io 
6.019 6.076 10.27 0.005 154 OKU Vv 14.422 14.614 14.56 0.007 248 NDJ Ic 
6.146 6.157 10.01 0.050 4 KW2 Ie 14.504 14.517 17.15 0.114 5 Trt Vv 
6.146 6.157 12.02 0.121 4 KW2 Vv 14.601 14.810 15.56 0.003 1830 LCO CR 
6.146 6.157 13.15 0.159 2 KW2 B 15.166 15.276 16.92 0.234 664 TAO R 
6.182 6.281 10.55 0.048 129 Aka Re 15.356 15.549 16.04 0.007 244 DPV CR 
6.210 6.280 12.41 0.060 64 Aka Vv 15.364 15.550 17.41 0.024 42 DPV Vv 
6.293 6.554 11.31 0.030 85 CRI Re 15.434 15.559 14.84 0.017 81 RPc Io 
6.295 6.550 12.30 0.037 83 CRI Vv 15.694 15.762 14.45 0.008 166 swi Io 
6.346 6.428 11.84 0.028 93 PZN Re 16.092 16.142 14.60 0.397 5 TAO R 
3.356 6.543 9.94 0.011 412 DPV Io 16.302 16.377 16.07 0.013 26 PZN CR 
6.363 6.521 12.25 0.010 572 Trt Vv 16.320 16.525 14.44 0.010 129 CRI Io 
6.369 6.406 10.09 0.008 334 DPV Io 16.344 16.435 14.50 0.012 52 DPV Ic 
6.430 6.615 12.38 0.022 418 RJV Vv 16.516 16.530 14.44 0.030 6 RPc Ie 
6.584 6.827 12.65 0.005 5910 LcCO Vv 16.680 16.937 14.36 0.006 335 Swi Ic 
6.592 6.861 10.54 0.012 794 RIT Io 17.358 17.518 14.48 0.006 218 DPV Ic 
6.717 6.944 10.36 0.007 942 Swi Ie 17.418 17.671 14.78 0.006 309 deM Ic 
6.745 6.912 12.35 0.010 347 GFB Vv 17.440 gears 14.62 0.014 43 RPc Io 
6.919 6.950 10.35 0.076 24 coo Ie 18.297 18.336 17.37 0.254 470 TAO R 
7.056 7.057 13.27 0.010 3 Kis Vv 19.328 19.332 16.45 0.258 68 TAO R 
7.057 7.137 = 10.55 0.019 295 Kis Vv 19.403 19.451 14.82 0.011 33. DPV Io 
7.115 7.147 = 10.01 0.018 45 Aka Re 19.423 19.498 14.79 0.026 17 RPc Io 
7.144 7.150 9.87 0.016 18 KW2 Ig 19.712 19.761 16.61 0.008 60 GFB CV 
7.144 7.150 11.76 0.068 18 KW2 Vv 20.435 20.592 14.98 0.008 90 RPc Ic 
7.144 7.150 13.44 0.184 2 KW2 B 21.023 21.031 15.34 0.012 10 RPc Io 
7.313 7.512 13.79 0.063 67 CRI B 


Start and end dates of observations, mean magnitudes, 1a of mean magnitudes, numbers of observations, observers’ codes, and filters are summa- 
rized. Note that observers for TAOS used custom made filters close to the union of standard R and V3*9, but the magnitude reported in the present 
Letter was approximately calibrated to standard R. 

*JD—2,457,189 (days). 

tMean magnitude. 

flo of mean magnitude. 

§Number of observations. 

||Observer’s code: PZN (IKI GRB follow up network), CRI (Crimean Observatory Team), deM (E. de Miguel), DPV (P. A. Dubovsky), Ter (Terskol Observa- 
tory), Kai (K. Kasai), NDJ (N. James), RPc (R. D. Pickard), Trt (T. Tordai), COO (L. Cook), Kis (S. Kiyota), KU1 (Kyoto Univ. Team), Mdy (Y. Maeda), LCO (C. 
Littlefield), RIT (M. Richmond), RJV (R. Javier), GFB (W. Goff), SWI (W. L. Stein), OKU (Osaka Kyoiku Univ. team), Sac (A. Miyashita), IMi (I. Miller), TAO 
(TAOS Team), KW2 (H. Maehara), Aka (H. Akazawa), Wnm (K. Hirosawa) and JSa (J. Lluis). 

Filter. B, V, Rc, lc are the standard Johnson-Cousins system. ‘CR’ and ‘CV’ mean unfiltered CCD photometry with zero point adjustment in R and V, 
respectively. ‘cG’ means green (G) channel output in a digital single-lens reflex camera, which gives an approximate response close to V (ref. 80). 
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Extended Data Table 2 | List of instruments for optical observations 


CODE Telescope (& CCD) Observatory (or Observer) Site 
PZN 1m Zeiss-1000 Tien Shan +Apogee Alta Astronomical Observatory Almaty, Kazakhstan 
40cm ORI-40+FLI MLO9000 ISON-Khureltogot Mongolia 
70cm+FLI AS-32+FLI IMG6303E Abastumani observatory Georgia 
CRI 1.25m AZT-11+FLI ProLine PL230 Crimean astrophysical observatory Crimea 
38cm K-380+Apogee E47 Cremean astrophysical observatory Crimea 
deM 35cm SC+QSI-516wsg Observatorio Astronomico del CIECEM Huelva, Spain 
DPV 28cm SC+MIl G2-1600 Astronomical Observatory on Kolonica Slovakia 
35cm SC+MIl G2-1600 Astronomical Observatory on Kolonica Slovakia 
VNT 1m+FLI PL1001E Astronomical Observatory on Kolonica Slovakia 
Ter Zeiss-600 60cm+SBIG STL-1001E Terskol Observatory Russia 
S2C 35cm Terskol Observatory Russia 
Kai 28cm SC+ST7XME Kiyoshi Kasai Switzerland 
NDJ 28cm SC+ST9XE Nick James UK 
RPc FTN 2.0m+E2V 42-40 LCOGT* Hawaii, USA 
35cmSC+SXV-H9 CCD Roger D. Pickard UK 
Trt 25cm ALCCD5.2 (QHY6) Tamas Tordai Budapest, Hungary 
coo T07' 43cm+STL-1100M AstroCamp Observatory Nerpio, Spain 
T21' 43cm+FLI-PL6303E iTelescope.Net Mayhill New Mexico, USA 
T11' 50cm+FL! ProLine PL11002M iTelescope.Net Mayhill New Mexico, USA 
Kis 25cm SC+Alta F47 Seiichiro Kiyota Kamagaya, Japan 
T18' 32cm+STXL-6303E AstroCamp Observatory Nerpio, Spain 
T5' 25cm+ST-10XME iTelescope.Net Mayhill New Mexico, USA 
T24' 61cm+FLI-PLO9000 Sierra Remote Observatoy California, USA 
KU1 40cm SC+ST-9XEl Kyoto U. Team Kyoto, Japan 
Mdy 35cm SC+ST10XME Yutaka Maeda Nagasaki, Japan 
LCo 60cm+Apogee Alta U42 CCD Van Vleck Observatory Connecticut, USA 
40cm+SBIG STL-6303 Van Vleck Observatory Connecticut, USA 
RIT 30cm+ST-9E RIT Observatory New York, USA 
RJV LX200R 40cm+ST8 XME Observatorio de Cantabria Spain 
GFB CDK 50cm+Apogee U6 William Goff California, USA 
swi C14 35cmSC+ST10XME William L. Stein New Mexico, USA 
OKU 51cm+Andor DW936N-BV OKU Astronomical Observatory Osaka, Japan 
Sac 20cmL+ST-7XMEi Atsushi Miyashita Tokyo, Japan 
IMi 35cm SC+SXVR-H16 Furzehill Observatory UK 
TAO TAOS-B* 50cm+S!800 E2V47-20 Lulin Observatory Taiwan 
TAOS-D' 50cm+SI800 E2V47-20 Lulin Observatory Taiwan 


Observers’ codes (see Extended Data Table 1), names of telescopes and CCD cameras, observatory (or observer) and sites are summarized. 
*Las Cumbres Observatory Global Telescope Network. 


titelescope.net. 


+The Taiwanese-American Occultation Survey (TAOS)348182, 
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Extended Data Table 3 | Basic information on objects showing violent short-term variations in outbursts 


V404 Cyg GRS 1915+105 IGR J17091-3624 Rapid Burster V4641 Sgr 
Orbital period [d] 6.47129(7) (ref. 83) 33.85(16) (ref. 20) >4 (ref. 84) = 2.81678 (ref. 23) 
Compact object BH BH BH NS BH 
Spectrum of the secondary K3IIl (ref. 7) K-M (ref. 20) - - BOlll (ref. 23) 
M; ( Me) 9.0(0.6) (ref. 7) 10.1(0.6) (ref. 85) 11.8-13.7 (ref.86)  1.1(0.3)(ref.87) _7.1(0.3) (ref. 88) 
q=  M,/M, (Mass ratio) 0.06 (ref. 7) 0.042(0.024) (ref. 20) - - 0.45(0.05) (ref. 88) 
i [deg] (Inclination angle) 67(3) (ref. 7) 66(2) (ref. 89) 50-70 (ref. 90) - 72.3(4.1) (ref. 88) 
V magnitude minimum 18.4 (ref. 91) = = = 13.8 (ref. 88) 
V magnitude maximum 10.9 (This work) = = = 8.8 (ref. 41) 


Shown are orbital period, nature of the compact object, spectrum of the secondary, mass of the central object (M1), mass ratio (q), inclination angle (/), minimum magnitude (V band), and maximum 
magnitude (V band) on V404 Cyg, GRS 1915+105, IGR 17091-3624, the Rapid Burster, and V4641 Sgr. M2, mass of the secondary star. References are cited as follows: 7, 20, 23, 41, 83-91. 
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A continuum from clear to cloudy hot-Jupiter 
exoplanets without primordial water depletion 


David K. Sing!, Jonathan J. Fortney’, Nikolay Nikolov!, Hannah R. Wakeford!, Tiffany Kataria!, Thomas M. Evans!, 
Suzanne Aigrain®, Gilda E. Ballester*, Adam S. Burrows*, Drake Deming®, Jean-Michel Désert’, Neale P. Gibson’, 
Gregory W. Henry’, Catherine M. Huitson’, Heather A. Knutson!”, Alain Lecavelier des Etangs!!, Frederic Pont', 
Adam P. Showman‘, Alfred Vidal-Madjar!', Michael H. Williamson? & Paul A. Wilson!! 


Thousands of transiting exoplanets have been discovered, but 
spectral analysis of their atmospheres has so far been dominated 
by a small number of exoplanets and data spanning relatively 
narrow wavelength ranges (such as 1.1-1.7 micrometres). Recent 
studies show that some hot-Jupiter exoplanets have much weaker 
water absorption features in their near-infrared spectra than 
predicted'°. The low amplitude of water signatures could be 
explained by very low water abundances®*, which may be a sign 
that water was depleted in the protoplanetary disk at the planet’s 
formation location’, but it is unclear whether this level of depletion 
can actually occur. Alternatively, these weak signals could be the 
result of obscuration by clouds or hazes!~“, as found in some optical 
spectra**!°11, Here we report results from a comparative study of 
ten hot Jupiters covering the wavelength range 0.3-5 micrometres, 
which allows us to resolve both the optical scattering and infrared 
molecular absorption spectroscopically. Our results reveal a diverse 
group of hot Jupiters that exhibit a continuum from clear to cloudy 
atmospheres. We find that the difference between the planetary 
radius measured at optical and infrared wavelengths is an effective 
metric for distinguishing different atmosphere types. The difference 
correlates with the spectral strength of water, so that strong water 
absorption lines are seen in clear-atmosphere planets and the 
weakest features are associated with clouds and hazes. This result 
strongly suggests that primordial water depletion during formation 
is unlikely and that clouds and hazes are the cause of weaker spectral 
signatures. 

We observed the transits of eight hot Jupiters as part of a spectral 
survey of exoplanet atmospheres with the Hubble Space Telescope 
(HST). The eight planets covered in our survey (WASP-6b, WASP- 
12b, WASP-17b, WASP-19b, WASP-31b, WASP-39b, HAT-P-1b and 
HAT-P-12b) span a large range of planetary temperature, surface grav- 
ity, mass and radii, allowing for an exploration of hot-Jupiter atmos- 
pheres across a broad range of physical parameters (see Table 1). In 
this survey, we observed all eight planets in the full optical wavelength 
range (0.3-1.01 j1m) using the Space Telescope Imaging Spectrograph 
(STIS) instrument. We also used the Wide Field Camera 3 (WFC3) 
instrument to observe transits of WASP-31b and HAT-P- 1b in the 
near-infrared (1.1-1.7 1m), and used additional WFC3 programmes 
to observe transits of four other survey targets (WASP-12b, WASP- 
17b, WASP-19b and HAT-P-12b). The HST survey was complemented 
by photometric transit observations of all eight targets at 3.6 1m and 
4.5\.m using the Spitzer Space Telescope Infrared Array Camera 
(IRAC) instrument. We analysed the survey targets in conjunction 
with HST and Spitzer data from the two best-studied hot Jupiters to 


date, HD 209458b (ref. 1) and HD 189733b (ref. 5), giving a total of 
ten exoplanets in our comparative study with transmission spectra 
between 0.3 1m and 51m (see Extended Data Table 1 for a detailed 
list of the observations). 

Our data reduction methods followed those in our previous 
studies**!!-14, in which the transmission spectra of WASP-19b, WASP- 
12b, HAT-P-1b, WASP-6b and WASP-3 1b were presented (see Methods 
for further details). The transit light curves’ of the band-integrated spec- 
tra were fitted simultaneously with detector systematics, with all HST 
and Spitzer transit data used to determine the planets’ orbital system 
parameters (inclination, stellar density and transit ephemeris), which 
were then fixed to the weighted mean values in the subsequent analysis 
measuring the transmission spectra. To create the broadband trans- 
mission spectrum, we extracted various wavelength bins for the HST 
STIS and WEC3 spectra and separately fitted each bin for the planet- 
to-star radius ratio Rp/R» and detector systematics. The uncertainties 
for each data point were rescaled, based on the standard deviation of the 
residuals, and any systematic errors correlated in time were measured 
using the binned residuals'®. 

The resulting transmission spectra are shown in Fig. 1 and exhibit 
a variety of spectral absorption features due to Na, K and H,0, as 
well as strong optical scattering slopes (for example, WASP-6b and 
HAT-P-12b). Planets such as WASP-39b show prominent alkali 
absorption lines with pressure-broadened wings, whereas other 
planets such as WASP-31b show strong but narrow alkali features, 
which implies that these planets are limited to lower atmospheric 
pressures. HO vapour has been predicted to be an important source of 
opacity for hot-Jupiter atmospheres'”~', and it is detected in five of the 
eight exoplanets where WFC3 spectra are available! >!*4, However, 
the amplitude of the H,O absorption varies greatly across the ten plan- 
ets, ranging from features that are very pronounced (as in WASP- 
19b)!* to those that are much smaller than expected (HD 209458b)! 
or even absent (WASP-31b)*. 

Previous studies using HST/WFC3 spectra have shown that 
HD 209458b, HD 189733b and WASP- 12b have low-amplitude water 
features!*°, which can be attributed to a severe depletion of atmos- 
pheric HO abundance relative to solar values®*. Any such depletion 
would be a remnant of planet formation, as H2O is expected to be well 
mixed in a hot atmosphere, such that currently measured molecular 
abundances would be consistent with primordial values. The deple- 
tion of water vapour can occur beyond a protoplanetary disk’s snow 
line’, where water is found predominantly as solid ice. Therefore, a 
hot Jupiter with a large depletion in H2O gas would imply that the 
planet formed at large orbital distances beyond the snow line and, 


lAstrophysics Group, School of Physics, University of Exeter, Stocker Road, Exeter EX4 4QL, UK. 2Department of Astronomy and Astrophysics, University of California, Santa Cruz, California 

95064, USA. 2Department of Physics, University of Oxford, Keble Road, Oxford OX1 3RH, UK. “Lunar and Planetary Laboratory, University of Arizona, Tucson, Arizona 85721, USA. °Department of 
Astrophysical Sciences, Peyton Hall, Princeton University, Princeton, New Jersey 08544, USA. "Department of Astronomy, University of Maryland, College Park, Maryland 20742, USA. Department 
of Astrophysical and Planetary Sciences, University of Colorado, Boulder, Colorado 80309, USA. 8European Southern Observatory, Karl-Schwarzschild-Strasse 2, D-85748 Garching bei Munchen, 
Germany. °Center of Excellence in Information Systems, Tennessee State University, Nashville, Tennessee 37209, USA. !°Division of Geological and Planetary Sciences, California Institute of 
Technology, Pasadena, California 91125, USA. LICNRS, Institut dAstrophysique de Paris, UMR 7095, 98 bis boulevard Arago, 75014 Paris, France. 


7 JANUARY 2016 | VOL 529 | NATURE | 59 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Table 1 | Physical parameters of hot Jupiters and associated spectral results 


Name Teg (K) g(ms~) Rp (Ry) Mp (M3) P (days) logR'uK AZup—tm/Heq AZ,_im/Heg H20 amplitude (%) Features Reference 

WASP-17b 740 3.6 89 0.51 3.73 —5.531 —0.80+0.36 -—1.48+0.71 94429 Na, H20 

WASP-39b ,120 4.1 2/7 0.28 4.06 —4.994 0.10+0.41 Na, K 

HD 209458b 450 9.4 36 0.69 3.52 —4.970 0.73 40.36 0.49+0.36 3245 Aer, Na, H20 

WASP-19b 2,050 14.2 Al 1.14 0.79 —4.660 1.04+41.79 —1.9741.32 105+20 H20 14 

HAT-P-1b 320 7.5 32 0.53 4.46 —4.984 2.0140.81 0.19+0.93 68+19 Na, H20 12,13 

WASP-31b 980 4.6 55 0.48 3.40 —5.225 2.15+0.77 1.2540.77 31412 Aer, K 4 

WASP-12b 2,510 11.6 73 1.40 1.09 —5.500 3.76+1.59 65+ 1.47 38434 Aer 3 

HAT-P-12b 960 5.6 0.96 0.21 3.21 —5.104 4.14+40.77 3740.79 17423 Aer, K 

HD 189733b ,200 214 14 1.14 2.22 —4.501 5.52+0.50 0.56+0.50 53.649.6 Aer, Na, H20 5,10 

WASP-6b ,150 8.7 22 0.50 3.36 —4.741 8.49 41.33 Aer, K 11 
The listed physical parameters are based on data compiled from our HST and Spitzer results?+1°-14 and online databases. Sources for published spectral results are also listed. Atmospheric features 
detected of cloud or haze aerosols, sodium, potassium and water are listed (Aer, Na, K and H20, respectively). The equilibrium temperature Teg assumes zero albedo and uniform redistribution. Also 
listed are the surface gravity, g; radius of the planet, R,; planet mass, M,; orbital period, P; and Ca lH and K stellar activity index logR’yx. Ry is the radius of Jupiter and My is the mass of Jupiter. 
AZus—.m/Heg gives the difference in pressure scale heights between the optical and mid-infrared transmission spectra, while AZ) Lw/Heg is the difference between the near- and mid-infrared 


(see Methods). The atmospheric scale height, Heq=kTeq/(yug), is estimated using the planet-specific equilibrium temperature and assuming a H/He atmosphere with a mean molecular mass of p= 2.3 
atomic mass units. The H20 amplitude is measured using the WFC3 data, taking the average radii from 1.34\1m to 1.49,.m and subtracting it from the average value between 1.224.m to 1.33,.m, then 
dividing that value by the theoretical difference as calculated by models!® assuming clear atmospheres and solar abundances. 
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Figure 1 | HST/Spitzer transmission spectral sequence of hot-Jupiter respectively. Planets with predominantly clear atmospheres (top) 
survey targets. Solid coloured lines show fitted atmospheric models show prominent alkali and HO absorption, with infrared radii values 
with prominent spectral features indicated. The spectra have been offset, commensurate with or higher than the optical altitudes. Very hazy and 
ordered by values of AZyg —.m (the altitude difference between the blue- cloudy planets (bottom) have strong optical scattering slopes, narrow 
optical and mid-infrared; Table 1). Horizontal and vertical error bars alkali lines and HO absorption that is partially or completely obscured. 


indicate the wavelength spectral bin and 1o measurement uncertainties, 
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Figure 2 | Pressure-temperature profiles and condensation curves. 
Profiles are calculated for each planet from one-dimensional non-grey 
radiative transfer models!’, which assume planet-wide average conditions 
in chemical equilibrium at solar abundances, and clear atmospheres. 
Profiles take into account incident stellar fluxes as well as the planetary 
interior fluxes that are appropriate given each planet's known mass and 
radius. Dashed and dotted lines are calculations of condensation curves 
of chemical species expected to condense in planetary and brown dwarf 
atmospheres». The thicker portions of the pressure-temperature profiles 
indicate the pressures probed in transmission. 


during its inward orbital migration, avoided accretion and dissolution 
of icy planetesimals as well as the subsequent accretion of an appreci- 
able amount of H,O-rich gas. Such scenarios have been proposed for 
Jupiter?””! based on Galileo probe measurements” that indicate it to 
be a water-poor gas giant, although the measurements were affected 
by local meteorology”. 

However, it is possible that these weak water absorption bands 
could be attributed to cloud opacity, which have yielded featureless 
transmission spectra for a number of transiting exoplanets”**. For 
simplicity, we define a cloud as a grey opacity source, and a haze 
as one that yields a Rayleigh-scattering-like opacity, which could be 
due to small (sub-micrometre size) particles. Silicate or higher-tem- 
perature cloud condensates are expected to dominate the hotter 
atmospheres, like those observed for brown dwarfs, while in cooler 
atmospheres sulfur-bearing compounds are expected to play an 
important part in the condensation chemistry”. In Fig. 2, we plot 
model atmospheric pressure-temperature profiles for the planets in 
our comparative study and compare them to the condensation curves 
for the expected cloud-forming molecules. The base, or bottom, of a 
condensate cloud is expected to form where the planetary pressure- 
temperature profiles cross the condensation curve; in this case, Cr, 
MnS, MgSiO3, Mg»SiO, and Fe are possible condensates. For exam- 
ple, the spectra of WASP-31b shows clouds‘, which probably form 
at pressures of about 10 mbar and can be explained by Fe or MgSiO4 
condensates. However, the curves alone cannot explain cloudy ver- 
sus cloud-free planets, because hazy planets such as HAT-P-12b and 
WASP-12b do not cross condensation curves at observable pressures. 
Therefore, atmospheric circulation must also play a part, as vertical 
mixing allows for particles to be lofted and maintained at pressures 
probed in transmission at the terminators. Additionally, equatorial 
eastward superrotation arising from day-night temperature variations 
can allow clouds that form on the nightside to be transported to the 
terminator?’. 

We compare spectral features from our large survey to both ana- 
lytic*+”® and radiative-transfer models assuming varying degrees of 
clouds and hazes'”!®. In order to evaluate the spectral behaviour of 
the sample as a whole, we define and measure three broadband spectral 
indices, which can then be compared to both the observational data 
and the theoretical models (see Table 1 and Methods). We first define 
an index AZyp —1m that compares the relative strength of scattering, 
which is strongest at blue-optical (0.3—-0.57 |1m) wavelengths, to that 
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Figure 3 | Transmission spectral index diagram of AZ; _ 1m versus 
H,0 amplitude. Black points show the altitude difference between the 
near-infrared and the mid-infrared spectral features (AZ) _ 1) versus the 
amplitude of the 1.4-\1m HO feature for eight of ten targets (see Table 1). 
Error bars represent the lo measurement uncertainties. Purple and grey 
lines show model trends for hazy and cloud atmospheres, respectively, 
with increasing Rayleigh scattering haze and grey cloud deck opacity 
corresponding to 10x, 100 and 1,000x the solar value. We also show 
clear-atmosphere models with sub-solar abundances of 0.1, 0.01 x 

and 0.001 x the solar value (red line). WASP-6b and WASP-39b are not 
included because there are currently no HST WEC3 data for these two 
planets. 


of molecular absorption, which is strongest at mid-infrared (3-5 1m) 
wavelengths and dominated by H2O, CO and CHy. We also define 
AZ;_1m to measure the relative strength between the near-infrared 
continuum (1.22-1.33 1m, located between strong H,O absorption 
bandheads) and the mid-infrared molecular absorption. Lastly, we 
quantify the amplitude of the H2O absorption feature seen in the WFC3 
data, calculating the ratio of the observed feature to that of radiative 
transfer models'” assuming clear atmospheres and solar abundances. 

Comparisons between these indices (Fig. 3, Extended Data Figs 1 and 2) 
show trends between cloudy and cloud-free planets. When comparing 
the AZ; _ 1m index to the H2O amplitude (Fig. 3), the hot-Jupiter trans- 
mission spectra strongly favour models in which the H2O amplitude is 
lower owing to obscuration by hazes and clouds, rather than to lower 
abundances (5.9o significance). Contaminating effects of persistent 
unocculted star spots’ and plages”® have been proposed in order to 
mimic the optical haze-scattering signature of hot Jupiters (particularly 
HD 189733b, which orbits an active star; see Methods). However, our 
survey sample is sufficiently varied in stellar activity, such that we find 
no correlation between stellar activity and the strength of the optical 
scattering slope (as measured by the AZyg — 1m index) for planets in our 
sample (Extended Data Fig. 3). One of the main distinguishing features 
between hazy atmospheres and those that are clear and have sub-solar 
abundances resides in the near-infrared continuum, measured with the 
WEC3 spectra. The presence of haze raises the level of the near-infrared 
continuum relative to the mid-infrared continuum, leading to high 
AZ;_ 1m index values with low near-infrared H,O amplitudes 
(Extended Data Fig. 4). In clear-atmosphere models, where the abun- 
dances are lower, the continuum level drops at both near- and mid- 
infrared wavelengths, accompanied by a reduction in the amplitude of 
absorption features, resulting in AZ; _ 1. index values that are too low 
to explain the data (Fig. 3). 

The hot-Jupiter transmission spectra ordered by the AZyg_1m spec- 
tral index reveals a continuum from clear atmospheres to atmospheres 
with strong clouds and hazes (Fig. 1 and Table 1). The presence of 
clouds has also been inferred for brown dwarf atmospheres, which 
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have similar temperatures to hot Jupiters””. Although there is a well 
defined sequence from the warmer, cloudy L-dwarfs to the cooler, clear 
T-dwarfs??*°, hot Jupiters do not exhibit a strong relationship of tem- 
perature to cloud formation, given that both cloudy and not cloudy 
planets appear throughout the entire 1,000-2,500 K temperature range 
(Fig. 2). 

We suggest that the difference between hot Jupiters and brown 
dwarfs is due to the vertical temperature structure of hot-Jupiter atmos- 
pheres. Hot Jupiters have very much steeper pressure-temperature 
profiles compared to isolated brown dwarfs, owing to the strong 
incident stellar flux heating the top of the planetary atmosphere (see 
Extended Data Fig. 5). Since cloud condensation curves run nearly 
parallel to hot-Jupiter profiles, a relatively small temperature shift 
(about 100 K) could easily move a cloud base by a factor of tens or 
hundreds in pressure, in or out of the visible atmosphere. In compari- 
son, because brown dwarfs have shallow pressure-temperature profiles, 
clouds will form in the visible atmosphere across a very wide temper- 
ature range. Furthermore, the expected nearly isothermal region of a 
hot-Jupiter profile at pressures from about 1 bar to 100 bar may cause 
some planets, but not others, to have cloud materials cold-trapped at 
depth, out of the visible atmosphere. Given this temperature sensitivity, 
the role of clouds in hot Jupiters may appear almost stochastic from 
planet to planet. In addition, hot Jupiters have a wider range of gravities 
and metallicities, both of which will affect the planet’s atmospheric 
temperature structure, circulation and condensate formation. 

Future studies will benefit greatly from broad atmospheric surveys 
that can further distinguish between clear and cloudy exoplanets. If the 
AZus—tm AZ;—~ im and H,O indices can be measured in advance of 
such surveys, planets with clear atmospheres can be identified and stud- 
ied in greater detail, allowing reliable chemical abundances to be meas- 
ured and thus providing valuable constraints on formation models. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


HST observations. The overall observational strategy is similar for each of the 
eight targets in the Large HST programme (GO-12473; principal investigator 
D.K.S.), which have been presented for WASP-19b!*, HAT-P-1b!*3, WASP-12b°, 
WASP-31b* and WASP-6b' with the details summarized here and applied to the 
remaining targets HAT-P-12b, WASP-17b and WASP-39b. We observed two tran- 
sits of each target with the HST STIS G430L grating, and one with the STIS G750L. 
The G430L and G750L data sets contain typically 43 to 53 spectra, which span 
either four or five spacecraft orbits and were taken with a wide 52 arcsec x 2 arcsec 
slit to minimize slit light losses. Both gratings have resolutions of R of /AA= 
530-1,040 (~2 pixels is 5.5 A for G430L and ~2 pixels is 9.8 A for G750L). The 
G430L grating covers the wavelength range from 2,900 A to 5,700 A, while the 
G750L grating covers 5,240 A to 10,270 A. The visits of HST were scheduled such 
that the third and/or fourth spacecraft orbits contain the transit, providing good 
coverage between second and third contact, as well as an out-of-transit baseline 
time series before and after the transit. Exposure times of 279s were used in con- 
junction with a 128-pixel-wide sub-array, which reduces the readout time between 
exposures to 21s, providing a 93% overall duty cycle. 

The STIS data set was pipeline-reduced with the latest version of CALSTIS, 
and cleaned for cosmic ray detections with a customized procedure'!. The G750L 
data set was defringed using contemporaneous fringe flats. The spectral aperture 
extraction was done with IRAE, using a 13-pixel-wide aperture with no background 
subtraction, which minimizes the out-of-transit standard deviation of the white- 
light curves. The extracted spectra were then Doppler-corrected to a common 
rest frame through cross-correlation, which helped remove sub-pixel wavelength 
shifts in the dispersion direction. The STIS spectra were then used to create both 
a white-light photometric time series and custom wavelength bands covering the 
spectra, integrating the appropriate wavelength flux from each exposure for dif- 
ferent bandpasses. 

Observations of HAT-P-1b and WASP-31b were also conducted in the infrared 
with the HST WFC3 instrument as part of GO-12473 and are detailed in refs 4 
and 13. The observations use the infrared G141 grism in forward spatial scan 
mode over five HST orbits. Spatial scanning is done by slewing the telescope in 
the cross-dispersion direction during integration in a similar manner for each 
exposure, which increases the duty cycle and greatly increases the counts obtained 
per exposure. We used the ‘ima’ outputs from the CALWFC3 pipeline, which per- 
forms reference pixel subtraction, zero-read and dark current subtraction, and a 
nonlinearity correction. For the spectral extraction, we trimmed a wide box around 
each spectral image, with the spectra extracted using custom routines from the 
programming language IDL, similar to IRAF’s procedure from the APALL pro- 
gram. The aperture width was determined by minimizing the standard deviation of 
the fitted white-light curve. The aperture was traced around a computed centring 
profile, which was found to be consistent in the y axis with an error of <0.1 pixels. 
Background subtraction was applied using a clean region of the untrimmed image. 
For wavelength calibration, direct images were taken in the F139M narrow-band 
filter at the beginning of the observations. We assumed that all pixels in the same 
column have the same effective wavelength, as the spatial scan varied in the x-axis 
direction by less than one pixel, resulting in a spectral range from 1.1 1m to 1.7|1m. 
This wavelength range was later restricted to avoid the strongly sloped edges of the 
grism response, which results in much lower signal-to-noise light curves. 

For the comparative study, we also included the WFC3 observations for WASP- 
19b!4, HD 209458b!, HAT-P-12b? and WASP-17b*! (GO-12181; principal investi- 
gator D.D.). The WFC3 observations of WASP- 12b? were also included (GO-12230; 
principal investigator M. R. Swain), as was HD 189733b° (GO-12881; principal 
investigator P. R. McCullough). The WFC3 observations of WASP-12b, WASP-17b, 
WASP- 19b and HAT-P-12b were observed in stare mode, rather than with spatial 
scanning, and therefore have generally poorer overall photometric precision. See 
Extended Data Table 1 for a list of all observations. 

Spitzer observations. The eight targets in the large HST survey were also all cov- 
ered by Spitzer transit observations as part of an Exploration Science Programme 
(90092; principal investigator J.-M. Désert) obtained using the Infrared Array 
Camera (IRAC) instrument with the 3.6-j1m and 4.5-j1m channels in subarray 
mode (32 x 32 pixels). Photometry was extracted from the basic calibrated FITS 
data cubes, produced by the IRAC pipeline after dark subtraction, flat-fielding, lin- 
earization and flux calibration. The images contain 64 exposures taken in sequence 
and have per-image integration times of 1.92 s. Both channels generally show a 
strong ramp feature at the beginning of the time series, and we elected to trim the 
first ~20 min of data to allow the detector to stabilize. We performed outlier filter- 
ing for hot (energetic) or cold (low-count values) pixels in the data by examining 
the time series of each pixel and subtracted the background flux from each image". 

We measured the position of the star on the detector in each image incorpo- 
rating the flux-weighted centroiding method using the background subtracted 
pixels from each image, for a circular region with a radius of 3 pixels centred on the 
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approximate position of the star. We extracted photometric measurements from 
our data using both aperture photometry from a grid of apertures ranging from 
1.5 to 3.5 pixels (in increments of 0.1) and time-variable aperture photometry. The 
best result was selected by measuring the flux scatter of the out-of-transit portion 
of the light curves for both channels after filtering the data for 50 outliers with a 
width of 20 data points. 

Transit light curve analysis. All the transit light curves were modelled with ana- 
lytical transit models!5. For the white-light curves, the central transit time, orbital 
inclination, stellar density, planet-to-star radius contrast, stellar baseline flux and 
instrument systematic trends were fitted simultaneously. The period was initially 
fixed to a literature value, before being updated, with our final fits adopting the 
values obtained from an updated transit ephemeris. Both G430L transits were 
fitted simultaneously with a common inclination, stellar density and planet-to-star 
radius contrast. The results from the HST white-light curve and Spitzer fits were 
then used in conjunction with literature results to refine the orbital ephemeris and 
overall planetary system properties. To account for the effects of limb-darkening on 
the transit light curve, we adopted the four-parameter nonlinear limb-darkening 
law, calculating the coefficients with stellar models**"°. 

As in our past STIS studies, we applied orbit-to-orbit flux corrections by fitting 
for a low-order polynomial to the photometric time series phased on the HST 
orbital period. The baseline flux level of each visit was free to vary in time linearly, 
described by two fit parameters. In addition, for the G750L we found it justified 
by the Bayesian Information Criteria* to also linearly fit for two further system- 
atic trends which correlated with the x and y detector positions of the spectra, as 
determined from a linear spectral trace in IRAF. The orders of the fit polynomials 
were statistically justified based on the Bayesian Information Criteria*, and the 
systematic trends were fitted simultaneously with the transit parameters. 

The errors on each data point were initially set to the pipeline values, which 
are dominated by photon noise but also includes readout noise. The best-fitting 
parameters were determined simultaneously with a Levenberg-Marquardt least- 
squares algorithm* using the unbinned data. After the initial fits, the uncertainties 
for each data point were rescaled based on the standard deviation of the residuals 
and any measured systematic errors correlated in time (‘red noise’), thus taking into 
account any underestimated errors calculated by the reduction pipeline in the data 
points. The uncertainties on the fitted parameters were calculated using the covar- 
iance matrix from the Levenberg-Marquardt algorithm, which assumes that the 
probability space around the best-fitting solution is well described by a multivariate 
Gaussian distribution and equivalent results were found when using an Markov 
Chain Monte Carlo analysis*®. Inspection of the two-dimensional probability dis- 
tributions from both methods indicated that there were no significant correlations 
between the planet-to-star radius contrasts and systematic trend parameters. 

In an additional analysis step compared to our previous results*!!?, we also 

marginalized over the systematic models* for the spectra of WASP-17b, WASP-39b, 
HAT-P-1b, HAT-P-12b and HD 209458b. Under this approach, we effectively aver- 
aged the results obtained from a suite of systematics models in a coherent manner. 
For each systematic model used to correct the data, we calculated the evidence 
of fit, which is then used to apply a weight to the parameter of interest (R,/R+) 
measured using that model. In doing so, we marginalized over our uncertainty 
as to selecting which model is actually the ‘correct’ model. For the STIS data we 
included all combinations of factors up to the 4th order in both HST phase, 3rd 
order in detector positions x and y, 3rd order in wavelength shift, and 1st order 
in time. For the WFC3 data, our grid of parameterized models includes all com- 
binations of factors up to the fourth order in both HST phase, to correct for ‘HST 
breathing’ effects, and up to the fourth order in wavelength shift, in addition to 
the visit-long linear trend. In addition, we also included exponential HST phase 
models, with a linear and squared planetary phase trend. For the Spitzer data, 
we included all combinations of the x and y positions of the stellar point spread 
function on the detector, including the cross-product from polynomials of x and y 
up to a second-order. We note that the best-fitting systematics models for HST 
and Spitzer are generally well constrained and the marginalized results were very 
similar to those based on model selection by the Bayesian Information Criteria™. 
For HD 209458b, lightcurve analyses and marginalization were performed using 
Gaussian process models**. Owing to the flexibility of Gaussian process models, a 
broad range of systematics behaviours can be captured without the need to provide 
an explicit functional form. The results of a single Gaussian process model are thus 
comparable to marginalizing over many simpler parametric systematics models, 
as was done for the other lightcurves*”. 
Atmospheric models. The synthetic spectra’”*” used for this study include isother- 
mal models as well as those with a self-consistent treatment of radiative transfer 
and chemical equilibrium of neutral and ionic species. Chemical mixing ratios and 
opacities were calculated assuming local thermochemical equilibrium accounting 
for condensation and thermal ionization but not photoionization*”’, for both 
solar metallicity and sub-solar metallicity abundances. 
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A simplified treatment adding in small aerosol haze particles was performed 
by including a Rayleigh scattering opacity (that is, 7=00(A/ Xo)~*) that had a 
cross-section which was 10x, 100 and 1,000 the cross-section of molecular 
hydrogen gas (9 =5.31 x 10°*”cm? at Ay = 350 nm ref. 44). Similarly, to include 
the effects of a flat cloud deck we included a wavelength-independent cross-section, 
which was 1x, 10 and 100x the cross-section of molecular hydrogen gas at 
350 m (see Extended Data Fig. 4). 

Transmission spectral indices. To enable a direct comparison between planets, 
the transmission spectra have been plotted on a common scale by dividing the 
measured wavelength-dependent altitude of the transmission spectra, z(\), by the 
planet’s atmospheric scale height (Hq, the vertical distance over which the gas 
pressure drops by a factor of e) estimated using the equilibrium temperature. The 
analytical relation for the wavelength-dependent transit-measured altitude z(,) 


of a hydrostatic atmosphere is“: 
ePo(X) | 2mRp 
tT \kTpg 


where ¢ is the abundance of the absorbing or scattering species, P is the pressure 
at a reference altitude, o(A) is the wavelength-dependent cross-section, T is the 
optical thickness at the effective transit-measured radius, k is Boltzmann’s constant, 
T is the local gas temperature, is the mean mass of the atmospheric particles, g the 
planetary surface gravity, R, the planetary radius, and H=kT/jg is the atmospheric 
pressure scale height. The altitude difference measured between two wavelength 
regions (\ and }’) in a transmission spectrum is proportional to the quantity: 


z(A) =HIn 


2(A) — 2(A’) = HIn(a/a’) 
where a is the absorption plus scattering extinction coefficient: 
a=eo(A) 


Thus, the quantity AZ) — y= z(A) — 2(X’) is related to the ratio of the total scatter- 
ing plus absorption of the atoms and molecules between the wavelength regions 
and \’, and we use the quantity AZ) _ y/Heq=In(a/a’) as a metric to intercompare 
the atmospheric extinction for the different planets in our survey. Note that the 
temperature and scale height of the upper atmosphere can differ from the equi- 
librium value, especially at high altitudes where hot upper layers in hot Jupiters 
have been found*?-**, 

We defined indices around three main wavelength regions (see Table 1). We 
used a blue-optical band consisting of the G430L grating, which is sensitive 
between 0.3 1m and 0.57 1m and roughly covers the Johnson U and B photomet- 
ric bandpasses. This wavelength region is almost always exclusively dominated by 
scattering for clear, cloudy and hazy exoplanets (see Extended Data Fig. 4). The 
second is a near-infrared band between 1.22 1m and 1.33,.m, which has overlap 
with the Johnson J photometric band, and is located between the strong HO 
absorption bands centred around 1.15}1m and 1.4j1m. This wavelength region is 
sensitive to the scattering continuum in hazy, cloudy and highly sub-solar models 
and the HO continuum in clear atmospheres with abundances near solar (see 
Extended Data Fig. 4). We also used a third wavelength region in the mid-infrared 
between 31m and 541m, which overlaps with the Johnson L and M photometric 
bandpasses and consists of the two Spitzer IRAC photometric channels 1 and 2. 
This wavelength region is highly sensitive to strong HO, CO and CH, absorption 
bands, which are the main active molecular species expected in hot Jupiters!”"!”, 
and only sensitive to scattering in the cloudiest cases, making it an overall effective 
measure of the total molecular extinction (see Extended Data Fig. 4). 

From the data, AZyg— tm was measured taking the difference between the 
planet radius measured in the blue-optical HST data using the G430L grating 
(UB, wavelengths 0.3-0.57 1m) and the weighted-average value of the radii meas- 
ured in Spitzer IRAC photometric channels 1 and 2 (LM, wavelengths 3-5 1m). 
AZj_.m was measured similarly, although using the near-infrared WFC3 data 
(J, wavelengths 1.22-1.33 1m). 

In addition, we also measured the amplitude of the near-infrared H2O absorp- 
tion band using the WFC3 spectra (see Table 1), measuring the average radii in 
a band containing strong H2O absorption (between 1.34|1m and 1.49|1m) com- 
pared to an adjacent band between strong H2O features (1.22}1m and 1.33 }1m). 
The measured HO amplitude for each exoplanet was then divided by the value 
predicted by atmospheric models!”*? calculated for each planet using a planet- 
averaged temperature-pressure profile assuming clear atmospheres and solar 
abundances. 

From Fig. 3, a likely inverse correlation is seen between the H,O amplitude and 
the AZ;_1/Heq index, with the Spearman's rank correlation coefficient measured 
to be —0.76 which has a false alarm probability of 2.8%. We note that this false alarm 
probability is not the probability that the water depletion scenario is correct, as that 


is excluded with Fig. 3 to a much higher degree (5.90 significance). A much weaker 
inverse correlation of —0.48 is found with AZyg_ 1m in Extended Data Fig. 2, 
although that has a high false alarm probability of 23%. 

Stellar activity. As stellar activity can affect the measurement of a transmission 
spectrum, we photometrically monitored the activity levels of our target stars with 
the Cerro Tololo Inter-American Observatory (CTIO) 1.3-m telescope for the 
southern targets’ and the Tennessee State University Celestron 14-inch (C14) 
Automated Imaging Telescope (AIT) located at Fairborn Observatory in Arizona 
for the northern targets”’. All but two of our targets showed low levels of stellar 
activity, with observed photometric variations or upper limits which are sufficiently 
small that their effects on measuring the transmission spectra are minimal com- 
pared to the measurement errors®*!!-!3. The two most active stars in the survey, 
WASP-19A and HD 1897334, were corrected for occulted and un-occulted star 
spots!”"*, As no contemporaneous photometric monitoring of WASP-19A is avail- 
able for the July 2011 WFC3 spectra from ref. 14, we matched the spectra to the 
spot-corrected transit depth of R,/R«=0.14019 + 0.00073 as measured using HST 
WEC3 on 12 June 2014 from GO-13431 (principal investigator C. M. Huitson), 
which had simultaneous CTIO activity monitoring. We also normalized the differ- 
ential transit depths of the WFC3 spectra’ to a transit depth value consistent with 
ref. 10, which has a uniform treatment between the HST and Spitzer data sets of 
system parameters, limb-darkening and activity correction. 

As effects of stellar activity could potentially mimic an optical scattering slope in 
a transmission spectra®!°?85, we searched for a relationship between the activity 
levels of the stars in our survey and the presence of a strong optical slope. If stellar 
activity were the main cause of the enhanced optical slopes, rather than scattering 
by hazes or clouds, then it is expected that highly active stars would have higher lev- 
els of spots and plages, and should show preferentially larger transmission spectral 
slopes. As an additional measure of stellar activity, we used the strength of the Ca 11 
H and K emission lines as a stellar activity indicator (logR’}x), as measured by 
Keck HIRES°°); see Table 1. We searched for a correlation with the chromospheric 
activity index logR’jx, as it is correlated with the stellar photometric variability” 
and can be used to quantify stars with low activity levels, for which the photometric 
variations would be undetectable. We found no significant correlation with logR’x 
activity and either the presence of haze or the strength of optical transmission 
spectral slope, as measured with the AZ;_ 1. index (Extended Data Fig. 3). This 
suggests that the effects of stellar activity are not the overall cause of the strong 
optical slopes seen in some of the transmission spectra. 

There are also other indications that stellar activity does not have a dominant 
role. For one, while changing stellar activity levels should have an effect on the 
transmission spectra, no significant variations were seen between the three epochs 
of the HST STIS spectra, which has an overlapping wavelength region, for all of 
our targets, including active stars. In addition, the atmospheric temperature can be 
derived by measuring the transmission spectral slope in an atmosphere dominated 
by Rayleigh scattering*!', and the temperatures found fitting a Rayleigh scatter- 
ing slope for HD 189733b, HAT-P-12b and WASP-6b (1,340 + 150K, 1,010 + 80K 
and 973 + 144K, respectively) are in good agreement with the planetary temper- 
atures Teg expected (1,196 K, 958 K and 1,183 K, respectively). This agreement is 
consistent with the atmospheric temperature, rather than stellar activity, being 
probed by the scattering haze. For these three stars, where HAT-P-12 has a much 
lower activity than the other two, the individual activity levels would have to be 
finely tuned for the spectral slopes to mimic the planetary temperatures. 

In addition to condensation chemistry°’, hazes can also form through pho- 
tochemical processes resulting in hydrocarbon aerosols”. This process is more 
effective for cooler exoplanets*> and the incident stellar ultraviolet irradiation 
also plays an important factor in hydrocarbon formation™. Our results indicate 
no correlation with the presence of haze to either the atmospheric temperature 
or levels of ultraviolet irradiation (as traced by stellar activity indicators), which 
generally favours condensation chemistry over photochemical processes as the 
general source of the observed hazes and clouds. 

Code availability. We have opted not to make the customized IDL codes used 
to produce the spectra publicly available owing to their undocumented 
intricacies. 
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Extended Data Table 1 | Summary of observations 


Optical near-IR mid-IR 
HST STIS, ACS HST WFC3 Spitzer IRAC 
0.3-1 pm 1.1-1.7 pm 3-5.2 um 
grating, Date (UT) grism, Date (UT) band, Date (UT) 


G430L, 2012/05/26 
G750L, 2012/05/30 G141, 2012/07/05 
G430L, 2012/09/19 


G430L, 2012/06/10 
G430L, 2012/06/16 
G750L, 2012/07/23 


G430L, 2012/03/14 

G430L, 2012/03/27 G141, 2011/04/12 
G750L, 2012/09/09 

G430L, 2012/06/13 

G430L, 2012/06/26 G141, 2012/05/13 
G750L, 2012/07/10 

G430L, 2012/04/11 

G430L, 2012/04/30 G141, 2011/05/29 
G750L, 2013/02/04 

G430L, 2012/04/30 G141, 2011/07/01 
G430L, 2012/05/04 G141. 2014/06/12 
G750L, 2012/05/09 , 


G430L, 2013/02/08 
G430L, 2013/02/12 
G750L, 2013/03/17 


G430L, 2012/06/08 
G430L, 2013/03/15 G141, 2011/07/08 
G750L, 2013/03/19 


G430L, 2003/05/03 
G750L, 2003/05/31 3.6, 2007/12/31 
G430L, 2003/06/25 3.6, 2008/07/19 
G750L, 2003/07/05 3.6, 2011/01/14 
G750M, 2000/04/25 eral eaves 3.6, 2014/01/19 
G750M, 2000/04/28 4.5, 2008/07/22 
G750M, 2000/05/05 4.5, 2010/01/19 
G750M, 2000/05/12 
G800L, 2006/05/22 
G800L, 2006/05/26 
G800L, 2006/07/14 3.6, 2007/11/25 
G430L, 2009/11/20 3.6, 2006/10/30 
HD 189733b G430L, 2010/05/18 G141, 2013/06/05 3.6, 2010/12/29 
G750M, 2009/10/30 4.5, 2007/11/22 
G750M, 2009/11/13 4.5, 2009/12/23 
G750M, 2010/09/30 
G750M, 2010/11/29 


Transit observations using the Hubble and Spitzer Space Telescopes. Dates are given in universal time (UT) listed along with the instruments and wavelength ranges used. 
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Evidence for a new phase of dense hydrogen above 


325 gigapascals 


Philip Dalladay-Simpson!, Ross T. Howie!+ & Eugene Gregoryanz!* 


Almost 80 years ago it was predicted that, under sufficient 
compression, the H-H bond in molecular hydrogen (H2) would 
break, forming a new, atomic, metallic, solid state of hydrogen!. 
Reaching this predicted state experimentally has been one of the 
principal goals in high-pressure research for the past 30 years. Here, 
using in situ high-pressure Raman spectroscopy, we present evidence 
that at pressures greater than 325 gigapascals at 300 kelvin, H, and 
hydrogen deuteride (HD) transform to a new phase—phase V. 
This new phase of hydrogen is characterized by substantial weakening 
of the vibrational Raman activity, a change in pressure dependence 
of the fundamental vibrational frequency and partial loss of the 
low-frequency excitations. We map out the domain in pressure- 
temperature space of the suggested phase V in H2 and HD up to 388 
gigapascals at 300 kelvin, and up to 465 kelvin at 350 gigapascals; we 
do not observe phase V in deuterium (D2). However, we show that the 
transformation to phase IV’ in D, occurs above 310 gigapascals and 
300 kelvin. These values represent the largest known isotropic shift in 
pressure, and hence the largest possible pressure difference between 
the H, and D, phases, which implies that the appearance of phase 
V of D2 must occur at a pressure of above 380 gigapascals. These 
experimental data provide a glimpse of the physical properties of 


dense hydrogen above 325 gigapascals and constrain the pressure and 
temperature conditions at which the new phase exists. We speculate 
that phase V may be the precursor to the non-molecular (atomic 
and metallic) state of hydrogen that was predicted 80 years ago. 

The exchange interaction, a purely quantum mechanical effect, 
forms one of the strongest bonds in chemistry—the H-H bond. 
Owing to this bond, hydrogen exists in molecular form, with atoms 
separated by approximately 0.74 A and a bond dissociation energy 
of approximately 4.52 eV (refs 2, 3) at ambient conditions. The first 
experiments to break this bond* demonstrated that extreme conditions 
are needed to do so; for example, the Hz molecule dissociates only to 
a minor extent at high temperatures (at 3,000 K, the degree of disso- 
ciation is around 10%)°. Another mechanism to break the hydrogen 
bond—pressure—was subsequently proposed'; it was theorized that 
above 250,000 atm (25 GPa), the hydrogen molecules would dissociate, 
forming solid, atomic, metallic hydrogen, an entirely new state of the 
first and simplest element. 

The proposed high-pressure route to an atomic metallic state has 
proved to be one of the great experimental challenges in high-pressure 
physics. Despite the technological advances in high-pressure physics, 
this theoretical prediction has yet to be experimentally confirmed, even 
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(1,917 cm! for Hp, 1,921 cm™! 
for HD and 1,912cm™! for D). 
HD was formed from a mixture 
of Hz (75%) and D, (25%); see 
Methods. The spectra were 
collected using a 647.1-nm 
excitation wavelength. The 
arrows ina and ¢ represent the 
splitting of the L3 mode and 
the onset of IV’ (for clarity see 
Extended Data Fig. 4). 


1S$chool of Physics and Centre for Science at Extreme Conditions, University of Edinburgh, Edinburgh EH9 3JZ, UK. 2Key Laboratory of Materials Physics, Institute of Solid State Physics, 
Chinese Academy of Sciences, Hefei 230031, China. Present address: Center for High Pressure Science & Technology Advanced Research, Shanghai 201203, China. 
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Figure 2 | Relative intensities of the vibrational, low-frequency modes 
and the full-width at half-maximum of the L; mode as a function of 
pressure. a, Relative intensities of the vibrational (v,) and four 
low-frequency modes (L;_4) of hydrogen represented as a percentage of the 
total Raman activity of the sample; error bars reflect the accuracy of the 
measurement (see Methods and Extended Data Fig. 7). The low-frequency 
modes L, and L; disappear at around 325 GPa. b, The full-width at half- 
maximum (FWHM) of the low-frequency mode L; of H, D2 and HD as 
function of pressure; the dashed curve is a guide to the eye. The dashed 
vertical lines in a and b indicate the transformations to phases IV’ and V. 


at pressures (and high temperatures) an order of magnitude higher than 
that originally proposed®*'*. Recently, a new solid phase of dense 
hydrogen—phase IV—was experimentally discovered'*'° at 300K and 
above 230 GPa. This new phase IV exhibits a change in the gradient of 
the fundamental vibrational-mode frequency 11 with respect to pres- 
sure P at a constant temperature T= 300 K, (dv, /dP)7, which leads to 
extremely low values of 1, above 230 GPa; for example, 1, ~2,750 cm! 
at 315 GPa (ref. 16). This value is indicative of a much weaker bond, 
compared to ambient conditions, and is consistent with the bond length 
of approximately 0.82 A (ref. 17). It was observed that phase IV could 
be viewed as a mixed molecular and atomic state and that the complete 
dissociation of the hydrogen molecule is feasible at even higher 
compressions’®. 

To investigate the states of hydrogen above 320 GPa, we conducted 
very high pressure studies on H2, HD (hydrogen deuteride) and D, 
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reaching pressures of 384 + 15 GPa, 388 + 15 GPa and 380 + 15 GPa, 
respectively. These pressures, despite being conservative estimates (see 
Methods and Extended Data Figs 1 and 2), are still among the highest 
pressures reported so far in a diamond anvil cell, and the highest pres- 
sures hydrogen has so far been subjected to in static experiments. On 
the basis of the substantial decrease in intensity of vibrational Raman 
bands, the change of slope of the vibrational-mode frequency with pres- 
sure, and changes in position, width and intensity of the low-frequency 
(<1,300 cm7!) modes, we tentatively infer a transition to a new struc- 
tural configuration—phase V—of H, and HD above 325 GPa, while 
observing phase IV’ of D2 above 310 GPa. We present experimental 
information on the physical properties of the dense hydrogen just below 
400 GPa and provide some constraints on the P-T space of phase V. 
On the basis of the optical changes observed through the phase tran- 
sition, we speculate that the proposed phase V might be the onset of a 
non-molecular state of hydrogen. 

Figure 1 shows the representative Raman spectra of three isotopes 
of hydrogen compressed at 300K. (For the full description of the rel- 
evant experimental details, see Methods and refs 18 and 19; further 
information about the intensities of the modes and frequencies of HD 
as a function of H/D concentration and pressure is provided in ref. 20.) 
Above 220 GPa, all hydrogen isotopes enter phase IV, which is charac- 
terized by sharp, well-defined, low-frequency modes (Fig. 1, marked 
for clarity as L;, Lz and L; in all figures; see also ref. 20) and the presence 
of a second vibrational fundamental mode 12. The appearance of the 
Raman spectra of HD (at similar pressures) is essentially identical to 
those of H2 or D» (refs 16, 18); see Extended Data Fig. 3. When pres- 
sures above 275 GPa are reached (for H; and HD), we observe a change 
in the gradient of the frequency with respect to pressure of the L3 mode, 
and its branching to produce a new L4 mode. These changes mark the 
appearance of phase IV’, described previously'®. It was suggested that 
phase IV’ could structurally resemble phase IV, on the basis of close 
similarities between the Raman spectra!®. Above 320 GPa we observe 
gradual, but profound, modification in the Raman spectra, indicative 
of the phase transformation to a new phase, phase V (Hz and HD only). 
The pressure needed to enter phase IV’ in deuterium is 35 GPa higher, 
as evidenced by the splitting of L3 into L3 and L, at 310 GPa (Fig. 1, 
Extended Data Fig. 4). 

In hydrogen, after branching to produce the L4 mode, the L3 
mode slowly redistributes its intensity into the Ly mode (Figs 1, 2 
and Extended Data Fig. 4). When the suggested phase V is reached, 
the L; mode completely disappears and the intensity of Ls becomes 
comparable to that of the L; mode (Figs 1 and 2). Meanwhile, the 
L; mode undergoes a marked change itself; Fig. 2b shows the full- 
width at half-maximum (FWHM) of L; as function of pressure. At the 
same pressure as when the vibrational Raman modes start to become 
weaker and the Lj and L3 modes disappear (>325 GPa; see Figs 1 
and 2), the FWHM of the L; mode starts to increase rapidly. Between 
330 GPa and 388 GPa the width of the L; mode increases more than 
twofold, reaching 180cm7! by 388 GPa (Fig. 2b). Even though the 
L; mode is very broad at the highest pressures, it remains the dom- 
inant feature of the spectra of all isotopes (Fig. 1). We also observe 
some small but detectable softening of the L; frequency with 
pressure (Fig. 3). 

Up to 325 GPa, the total Raman intensity of all modes stays roughly 
the same (Fig. 2) for all three isotopes, in agreement with previous 
studies'® of pure H2 up to 315 GPa. However, when pressures above 
325 GPa are reached, the low-frequency modes L and L; disappear and 
the intensities of both vibrational excitations of H, and HD start to 
decrease rapidly. In the case of hydrogen, the v, modes become almost 
indistinguishable from the background above 358 GPa, whereas the 1, 
mode becomes broad and weak, overlapping with the second-order 
diamond band (Fig. 1a); the positions of the hydrogen and deuterium 
vibrational modes are clearly visible in all spectra (Fig. 1). The sec- 
ond-order diamond mode spanning the approximate range 2,300- 
2,600 cm! overlaps in frequency with the 1; mode of all isotopes, 
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Figure 3 | Frequencies of the vibrational and low-energy modes of the 
isotopes as functions of pressure. a—c, The data for the different isotopes 
are shown as open stars (D.; a), triangles with centre dots (HD; b) and 
open circles (H); c), with different colours representing different 
experimental runs. Data from previous studies!®'® are shown as grey 
symbols. The vertical dashed lines denote the phase transitions in the 


which makes the estimation of its intensity difficult. In the case of HD, 
it became impossible to distinguish between the second-order diamond 
mode and the , mode above 350 GPa (Fig. 1b). The notable decrease 
of the vibrational-mode intensity means that the spectra of the sug- 
gested phase V looks highly unusual, particularly when compared with 
those of phase IV, in which the vibrational mode dominates (see 
>270-320-GPa spectra of all isotopes in Fig. 1). As well as the pro- 
nounced drop of the intensities of the vibrational modes, we observe a 
change in the slope of the 1; frequency with pressure (dv, /dP)7 at 
around 325 GPa for hydrogen and hydrogen deuteride (Fig. 3). The 1 
mode softens rapidly with pressure in phase IV (average gradient of 
—12cm7!GPa7}; ref. 16) and changes to a rate of about —7 cm~!GPa™! 
(refs 16, 18) in phase IV’. When hydrogen is compressed to more than 
325 GPa, the softening of the 1, mode essentially stops and (dv, / dP) 
becomes almost independent of pressure (equal to —1.37cm~!GPa~'), 
resembling that of the v, mode (—1.01 cm~! GPa); see Fig. 3. The 
change of slope and the sudden increase of the FWHM of the L; mode 
happen at the same pressure, suggesting that the nature of the bonding 
is noticeably modified by the transition between phase IV(IV’) and V’. 
Ina recent Raman optical study”!, a small change in the slope of the 
vibrational mode of the H>2 vibrational mode at 300 GPa was observed, 
from which three structural phase transitions within 50 GPa (275- 
325 GPa) were inferred. However, our data do not seem to support these 
findings (Extended Data Fig. 5). 

The pressure at which phase IV of deuterium appears is about 10 GPa 
higher than that of hydrogen’, whereas the transition from phases IV 


Pressure (GPa) 


corresponding isotope. d, The theoretically calculated frequencies of 
hydrogen for the metallic and non-molecular (atomic) structures of I4;/amd 
(red) and R3m (yellow) from ref. 22. The insets are photos of the HD sample 
at 50 GPa and at 388 GPa, as labelled, taken in transmitted and reflected 
light. 


to IV’ is shifted by 35-40 GPa. We observe similar qualitative changes in 
the slope of the deuterium vibrational mode at 310 GPa upon entrance 
into phase IV’, but the slope remains relatively steep, resulting in the 
extremely low vibrational frequency of approximately 2,100 cm! at 
380 GPa. The large pressure difference between phase IV’ of hydrogen 
and that of deuterium suggests that phase V of deuterium will appear 
at pressures above 380 GPa. 

We investigated the P-T space where phase IV(IV’) and the proposed 
phase V exist by conducting heating experiments. If hydrogen is heated 
at 250 GPa, then the phase IV < I transformation happens at 430 K, 
and at 450 K phase I presumably melts (see Fig. 4 and the figures in 
ref. 14). In some runs, phase IV(IV’) was heated at pressures above the 
I-IV-liquid" triple point—for example, at 262 GPa to approximately 
450K and at 270 GPa and 290-310 GPa to approximately 375 K—but no 
transformations to phase V were observed (Fig. 4). Finally, we heated 
phase V at 350 GPa and did not observe any transformation up to 465 K 
(Extended Data Fig. 6). These points in P-T space indicate that phase 
V is separated from the lower-pressure phase IV(IV’) by a phase line 
that is probably close to vertical (Fig. 4). 

The decrease in the vibrational-mode intensities could indicate the 
loss of sample, particularly in the case of hydrogen, but the obser- 
vations described above rule this out and instead indicate a possible 
phase transition. These observations include: the evolution of the low- 
frequency modes (that is, broadening and frequency change) with 
pressure up to 390 GPa, and only a modest drop in the intensity of the 
L; mode; the noticeable change in the slope of the vibrational-mode 
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frequency with pressure at 325 GPa at 300 K; and the lack of sample 
loss or detection of transformation upon heating at pressures above 
320 GPa. In heating experiments, rapid sample loss is observed in the 
liquid state, which results in the complete disappearance of all hydrogen 
Raman activity and the resulting spectra resemble those of the gasket 
(Extended Data Fig. 6). 

It is tempting, although highly speculative at this time, to interpret 
phase V as the onset of the predicted! non-molecular and metallic state 
of hydrogen. Ab initio random-structure searches that included zero- 
point motion estimate that hydrogen should dissociate into atomic and 
metallic states at around 500 GPa (ref. 22) and 380 GPa (ref. 23), respec- 
tively. The possible lowest-energy structural candidates include the 
tetragonal 14,/amd and trigonal R3m symmetries”*”>. Both structures 
have inter-atomic Raman phonons with frequencies of about 
2,500 cm~! at 500 GPa, which is close to the frequency of the vibrational 
mode 1; of hydrogen that we observe at 380 GPa (see Fig. 3 and sup- 
plementary information in ref. 22). In calculations, these phonons are 
present up to 4.2 TPa, slowly increasing in energy with increasing pres- 
sure” as the distance between the atoms decreases. The presence of the 
extremely weak 1 mode at 384 GPa (Fig. 1a) indicates that the purely 
atomic state was not reached in our experiments and that slightly higher 
pressures are required to completely dissociate hydrogen. It is plausible 
that the molecular dissociation commences at pressures above 350 GPa, 
resulting in the alterations of the Raman spectrum as described here. 
If the suggested phase V is indeed the beginning of the complete molec- 
ular dissociation of partially molecular phase IV’, then it could explain 
all the optical observations presented here, such as the band gap 
decreasing with pressure (1.8 eV at 315 GPa in phase IV’, ref. 16). 
Furthermore, the possible appearance of conducting electrons due to 
dissociation could explain the very dark appearance of the sample as 
seen in transmitted and reflected light in the visible region (Fig. 3d, 
inset), and the overall decrease of the Raman intensities. The relatively 
simple overall Raman spectrum observed experimentally matches 
those predicted theoretically rather well (Fig. 3d). The [4,/amd 
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Figure 4 | Proposed phase diagram of hydrogen up to 

400 GPa. The coloured, filled symbols and solid phase lines 
below 300 GPa in the main figure are from ref. 14, and show 
phases I(I’), II] and IV(IV’). The solid black diamonds (phase 
V) are from this study, the vertical, grey dashed line indicates 
the transition from phase IV’ to phase V and the dashed- 
dotted arrow is the proposed continuation of the melting 
curve. The inset shows a sketch of the phase diagram of D3. 
The coloured, filled symbols and solid lines were obtained 
by us in another, unreported study. The dashed lines are the 
proposed melting curves of deuterium, which have not been 
measured experimentally, but are assumed to follow the 
same trend as those of hydrogen. The red open triangles are 
from ref. 26 and separate the metallic and semi-conducting 
liquids. 
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symmetry does not predict a very prominent L; mode, whereas the 
R3m symmetry, which is more energetically favourable at even higher 
pressures, does not predict the Ly mode (see Fig. 3b, c), both of which 
are observed experimentally. However, these discrepancies could be 
accounted for by the 100-GPa pressure difference between theory and 
experiment. A minimum in (dv, /dP); would indicate the evolution 
from the intramolecular vibrational mode to an interatomic phonon. 
This change could require a 100-GPa pressure range to complete and 
would result in hardening of the phonons at pressures above 500 GPa. 

The data from this and a previous melting study" provide further 
insight into the current phase diagram of hydrogen (Fig. 4). It appears 
that there could be another triple point between the proposed phase 
V, phase IV(IV’), and a liquid state (not shown) at above 275 GPa and 
450K. If phase V is indeed a precursor to a fully non-molecular, and 
presumably metallic, solid state, then a question arises about the exist- 
ence and location of the phase line separating the molecular (insulat- 
ing) and non-molecular (metallic) liquids and solids. A non-molecular 
liquid could be expected to exist in the same pressure range as phase V, 
but at higher temperatures. In fact, theoretical studies have sug- 
gested a phase transition from a molecular liquid to an atomic liq- 
uid in hydrogen**”*. The data presented in refs 24 and 25 suggest 
the existence of the highly conducting atomic liquid state at pres- 
sures as low as about 150 GPa and above 2,000 K (ref. 25). However, 
shock-wave experiments”® indicate the existence of the metallic liq- 
uid deuterium at higher pressures of 350 GPa, with the corresponding 
phase line being almost vertical (Fig. 4, inset). These experimental 
results seem to be in a very good agreement with our current study. 
Extrapolation of the data from ref. 26 to lower temperatures would 
imply yet another triple point between the melting curve and the 
two liquid phases. The presence of two dissimilar liquids would 
suggest the presence of two solid phases below them, with proper- 
ties mimicking those of the liquids, for example, non-molecular 
(insulating) versus atomic (metallic). Experimental confirmation 
of the location of the phase line(s) and triple points would be very 
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important for the complete description of phase V, even higher- 
pressure solid phases and the possible molecular—atomic transition. 
An understanding of the connection between the proposed metalliza- 
tion and phase V is also required. Such additional data could provide 
invaluable information about the fundamental physics and chemistry 
that governs the behaviour of the simplest element at high densities. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Sample loadings. The experimental runs used mostly the same techniques and 
method described in refs 16, 18, 19 and references therein. For this study we con- 
ducted a total of 14 independent experiments up to pressures of 388 + 15 GPa. In 
some of the runs we heated the sample, at different pressures, up to temperatures 
of 465 K. Pressure was generated in long, high-temperature, piston-cylinder dia- 
mond anvil cells of our own design equipped with diamonds with culet dimensions 
ranging from 30j1m to 151m. The rhenium foils with thicknesses of 200-250 1m 
were used as the gasket material to form the sample chamber. The hydrogen gas 
was clamped at 0.175-0.200 GPa at 300 K and then further compressed to above 
150 GPa, usually within 2-3h after clamping. The HD was produced by mixing 
the pure isotopes in gas phases (usually <10 MPa) at 300K. The partial pressures 
were used to calculate the composition, which, for the experiment on HD described 
here, was 75% and 25% for hydrogen and deuterium, respectively (see also ref. 20). 
Optical measurements. We used 514.15-nm and 647.1-nm excitation wavelengths 
to collect the spectra. Owing to the quantum efficiency of the visible CCD (charged 
coupled device) used, the high-energy modes—for example, hydrogen vibrational 
excitation at above 3,500 cm~!—are much weaker than the low-energy lattice 
modes if probed using a 647.1-nm wavelength. However, in most of the cases, 
when 514.15-nm excitation is used, the pressure-induced fluorescence from the 
stressed diamonds obscures the Raman signal, which leaves 647.1-nm excitation 
as the only available source, as in Fig. 1. 

Pressure and temperature measurements. For pressure measurements, the 
stressed-diamond-edge frequency was used and, where applicable, cross- 
referenced with the frequency of the vibrational modes!” from previous experi- 
ments to maintain self-consistency. An example of how the frequency of the 
stressed diamond edge was determined, and the dependence of the vibrational 
frequency of hydrogen versus the frequency of the stressed diamond edge is given 
in Extended Data Fig. 1a. The first-order diamond Raman band becomes elongated 
in frequency space, composed of two sharp, well-defined peaks: one correspond- 
ing to the stressed culet and the other to the unstressed regions of the diamond. 
The frequency from the stressed culet was determined by the frequency (w) at 
which dI / dw was minimized (where I is the intensity of the spectrum), a technique 
proposed in refs 11, 27, and 28. 

The calibration data presented in refs 27 and 28 were primarily used here for 
determining pressure. These two curves agree up to about 200 GPa, but gradually 
diverge at higher pressures (Extended Data Fig. 2b). For example, at the high- 
est pressure reached, we observed a diamond-edge frequency of 1,936cm ! (see 
Fig. 1), which corresponds to pressures of approximately 388 GPa and 403 GPa on 
the scales proposed by Akahama & Kawamura in 2004”’ and 2006”, respectively. 


With their latest calibration in 2010”, this frequency corresponds to a substan- 
tially higher pressure of 449 GPa (Extended Data Fig. 2b). However, the effect of 
pressures above about 300 GPa on soft samples has yet to be determined; the latest 
calibration”? up to 410 GPa needs to be independently verified, particularly for 
softer samples. To be consistent with previous results, we decided to use the most 
conservative scale’, as was used in our previous studies!®!*". This scale provides 
a smooth continuation of the frequencies of the low-energy and vibrational modes 
versus pressure observed by us in all experiments. 

We therefore stress that the characteristics that provide evidence for the phase 
V transition are independent of the choice of the previously discussed calibrations, 
not a direct consequence. Extended Data Figure 2a demonstrates that the discon- 
tinuous change in dv /dP for pure H) is present when using any of the stressed- 
diamond-edge pressure calibrations, and remains just as prominent when using 
the less-conservative and more-contemporary pressure scales!!”, 

For heating, we used two custom-built resistive heaters placed around the dia- 

monds and the body of the cell. Temperature was determined using one or two 
thermocouples, attached to one of the diamonds and/or the gasket. 
Calculating relative integrated intensities. Calculating the relative Raman intensi- 
ties in the diamond anvil cell is a difficult task, especially when these intensities are 
of similar magnitude to the relatively low signal-to-noise ratios. Therefore, the data 
in Fig. 2 are from the spectrum with the highest signal-to-noise ratio in each run. 
First, the background caused by the pressure-induced fluorescence of the diamond 
anvils is subtracted. The residual data are then fitted with Voigt profiles, which 
produces values for the integrated intensities of each excitation. These values are 
then summed, and the percentage of total Raman activity is calculated; an example 
is provided in Extended Data Fig. 7. Owing to the extremely small samples, the 
second-order Raman band also becomes comparable in magnitude to the excita- 
tions from the sample (Extended Data Fig. 7, inset). Consequently, at higher pres- 
sures for which the 1 excitations overlap with the second-order Raman diamond 
band, extra care has to be taken. Here, the evolution of the spectra with pressure 
(which is determined using fits from a previous pressure step as an initial guess) 
as well as the relationship between the intensity of the first- and second-order 
diamond bands are used to accurately determine the integrated intensity of 1. 


27. Akahama, Y. & Kawamura, H. High-pressure Raman spectroscopy of diamond 
anvils to 250 GPa: method for pressure determination in the multimegabar 
pressure range. J. Appl. Phys. 96, 3748-3751 (2004). 

28. Akahama, Y. & Kawamura, H. Pressure calibration of diamond anvil Raman 
gauge to 310 GPa. J. Appl. Phys. 100, 043516 (2006). 

29. Akahama, Y. & Kawamura, H. Pressure calibration of diamond anvil Raman 
gauge to 410 GPa. J. Phys. Conf. Ser. 215, 012195 (2010). 
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Extended Data Figure 1 | Calculating pressure. a, A typical example of a 
spectrum from the first-order Raman band of diamond when probing the 
sample (orange). The frequency edge is given by the vertical dashed line 
at 1,796 rel. cm~!, which corresponds to a pressure of 275 GPa (ref. 27). 
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This stressed edge is defined as the frequency that minimizes dI/dw 
(purple). b, H, vibrational-mode (vibron) frequency (1) plotted as a 
function of the stressed-diamond-edge frequency. 
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Extended Data Figure 3 | HD compressed to 218 GPa. Representative 
Raman spectra from HD, a mixture of hydrogen (75%) and deuterium 
(25%), as a function of pressure at 300 K. The spectra show the evolution 
of the 1, vibrational modes of HD (labelled ‘HD-1,’) and H) (labelled 
“H2-1;’) from loading at 0.5-218 GPa, as labelled. Above 47 GPa, there is 
an observed transfer of integrated intensity from the 1, band of H; to the 
v, band of HD, with the latter vibrational mode becoming stronger than 
the former at 150 GPa, and the only resolvable 1, band above 218 GPa. The 
spectra were collected using a 514-nm excitation wavelength. The spectra 
from this run above 218 GPa are shown in Fig. 1b. 
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Extended Data Figure 4 | Low-energy-mode splitting from phase IV to pressure during the transition from phase IV to phase IV’. The low- 
phase IV’. Representative Raman spectra of the low-frequency excitations frequency mode L; splits to produce mode Ly. 
of the three isotopes (left, H2; centre, D2; right, HD) as functions of 
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Extended Data Figure 5 | Comparison with previous data. a, Frequencies _ data for D2. b, Representative Raman spectra of hydrogen from ref. 21 


of the vibrational modes versus pressure from ref. 21 (black circles) and (black) and ref. 16 (violet). The dashed vertical line indicates the lowest 
the current study and our previous study’® (violet squares). The open vibrational-mode frequency (and therefore the highest pressure) observed 
symbols represent data for H); the symbols enclosing pluses represent in ref. 21. 
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Extended Data Figure 6 | Heating at about 360 GPa. a, Raman spectra 
for a pure hydrogen sample, taken using a probe laser with a wavelength 
of 647 nm, as function of temperature at pressures between 367 GPa 
and 350 GPa (black). The Raman spectrum collected 2 |1m away on 

the rhenium gasket is shown in red. The vertical dashed lines indicate 
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the frequency space occupied by the second-order diamond band. DE, 
diamond edge. b, Example spectrum of the sample (black) and the gasket 
(2 um away, red), collected at 361 GPa, and the difference between them 
(blue). 
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Partially oxidized atomic cobalt layers for carbon 
dioxide electroreduction to liquid fuel 


Shan Gaol, Yue Lin', Xingchen Jiao!, Yongfu Sun!?, Qiquan Luo!, Wenhua Zhang’, Diandi Li, Jinlong Yang! & Yi Xie! 


Electroreduction of CO, into useful fuels, especially if driven 
by renewable energy, represents a potentially ‘clean’ strategy for 
replacing fossil feedstocks and dealing with increasing CO, emissions 
and their adverse effects on climate!~*. The critical bottleneck lies in 
activating CO, into the CO," radical anion or other intermediates 
that can be converted further, as the activation usually requires 
impractically high overpotentials. Recently, electrocatalysts based 
on oxide-derived metal nanostructures have been shown** to enable 
CO); reduction at low overpotentials. However, it remains unclear 
how the electrocatalytic activity of these metals is influenced by 
their native oxides, mainly because microstructural features such 
as interfaces and defects? influence CO, reduction activity yet 
are difficult to control. To evaluate the role of the two different 
catalytic sites, here we fabricate two kinds of four-atom-thick layers: 
pure cobalt metal, and co-existing domains of cobalt metal and 
cobalt oxide. Cobalt mainly produces formate (HCOO7 ) during 
CO, electroreduction; we find that surface cobalt atoms of the 
atomically thin layers have higher intrinsic activity and selectivity 
towards formate production, at lower overpotentials, than do surface 
cobalt atoms on bulk samples. Partial oxidation of the atomic layers 
further increases their intrinsic activity, allowing us to realize stable 
current densities of about 10 milliamperes per square centimetre 
over 40 hours, with approximately 90 per cent formate selectivity at 
an overpotential of only 0.24 volts, which outperforms previously 
reported metal or metal oxide electrodes evaluated under comparable 
conditions’”*”!°, The correct morphology and oxidation state can 
thus transform a material from one considered nearly non-catalytic 
for the CO electroreduction reaction into an active catalyst. These 
findings point to new opportunities for manipulating and improving 
the CO, electroreduction properties of metal systems, especially once 
the influence of both the atomic-scale structure and the presence of 
oxide are mechanistically better understood. 

To explore the catalytic role of metal sites and metal oxide sites, we 
first construct a system containing a 4-atom-thick metal layer and then 
create the corresponding metal oxide on its surface. The atomic thick- 
ness of this model system ensures that most of the metal atoms are 
present as either surface atoms or surface ions!"!”, so we can explore 
how the presence of a surface oxide influences the catalytic activity 
of the corresponding metal. We focus on cobalt (Co) because metals 
with loosely bonded d electrons and the resulting high electrical con- 
ductivity are promising for CO, reduction’, and because both Co and 
its oxide are widely used catalysts'*'®. Importantly, the spontaneous 
oxidation of Co nanostructures in air is relatively slow!’, and the use of 
other gases makes it possible to manipulate and control the oxidation 
process!®. However, strong in-plane bonds and the lack of an intrinsic 
driving force for two-dimensional anisotropic growth make the syn- 
thesis of Co atomic layers and the controlled conversion of such layers 
into partially oxidized atomic layers very challenging. 

We produced freestanding 4-atom-thick Co sheets with and without 
surface Co oxide using a ligand-confined growth strategy, in which 


the use of dimethylformamide and n-butylamine proved crucial for 
reducing the metal ions and enforcing a sheet-like morphology, respec- 
tively (Extended Data Fig. 1). The starting reagent cobalt(III) acetylac- 
etonate, Co(acac)s, initially hydrolyses into [Co(H20)¢]**, on which 
n-butylamine is adsorbed to reduce surface energy and avoid aggrega- 
tion (Fig. la and Extended Data Fig. 2a) until sheet-like products grad- 
ually appear during the subsequent condensation process (Extended 
Data Fig. 2b). Controlled fabrication of either partially oxidized or pure 
Co atomic layers is achieved by using dimethylformamide to gradually 
reduce the cobalt ions’® (Fig. 1a), illustrated by the products obtained 
at 220°C after reaction times of either 3h or 48h. 

Transmission electron microscope (TEM) images of the product 
obtained after 3h reveal a sheet-like morphology, while the powder 
X-ray diffraction (XRD) pattern can be readily indexed to hexagonal 
Co (Extended Data Fig. 3a, b). High-resolution TEM images demon- 
strate that the majority of these two-dimensional sheets correspond to 
the [001]-oriented hexagonal Co (Fig. 2a, b and Extended Data Fig. 3c), 
with average sheet thickness of 0.84nm determined with atomic force 
microscopy (Fig. 1b, c), close to the 0.82-nm thickness of a 4-atom-thick 
Co slab along the [001] direction. Lateral high-angle annular dark-field 
scanning transmission electron microscopy (HAADF-STEM) imaging 
confirms the 4-atom layer thickness (Fig. 1d-g). 

However, the high-resolution TEM images in Fig. 2a and c also reveal 
the presence of another distinct structural domain with an interplanar 
spacing of 0.205 nm and dihedral angle of 90°. This domain cor- 
responds to the (400) plane of cubic Co30,, which is embedded in 
the metallic Co lattice (see schemes in Fig. 2d and e; another high- 
resolution TEM image from a larger area of an individual sheet show- 
ing Co oxide embedded in Co metal is provided in Extended Data 
Fig. 3c). Elemental mapping (Fig. 2f-h) supports the conclusion that 
Co metal and Co oxide co-exist in this sample. This is also consistent 
with the observation of micro-Raman peaks at 482 cm~!, 523cm7!, 
621cm | and 694cm_! (Fig. 2), which correspond to a Co3O4 phase” 
and that disappear upon increasing the reaction time to 48h. Taken 
together, these observations demonstrate that the 4-atom-thick Co 
sheets obtained after 3 h at 220°C contain some Co oxide, whereas 
increasing the reaction time to 48h leads to the formation of pure Co 
4-atom-thick layers. 

To characterize the performance of the materials as CO2 reduction 
electrocatalysts, they were loaded onto a glassy carbon electrode that 
served as the working electrode; linear sweep voltammetry was carried 
out using a CO>-saturated 0.1 M Na2SO, solution in a three-electrode 
set-up. Use of the partially oxidized Co 4-atom-thick layers generates 
a current density of 10.59 mA cm at —0.85 V versus a saturated calo- 
mel electrode (SCE) (Fig. 3a), roughly 10, 40 and 260 times larger than 
the current densities obtained with the pure-Co 4-atom-thick layer 
(Extended Data Fig. 4), the partially oxidized bulk Co and bulk Co 
(Extended Data Fig. 5), respectively. 

Quantification of the solution-phase products by 'H nuclear mag- 
netic resonance (NMR) shows that of the four samples, the partially 
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Figure 1 | Synthetic scheme and characterizations 
of Co 4-atom-thick layers with and without surface 
oxide. a, Schematic formation process of the partially 
oxidized and pure-Co 4-atomic-layer, respectively. 
b-g, Characterizations for the partially oxidized Co 
4-atomic-layers: atomic force microscopy image (b) 
and the corresponding height profiles (c) (we note 
that the numbers from 1 to 3 in ¢ correspond to the 
numbers from 1 to 3 in b), lateral HAADF-STEM 
image (d) and the corresponding intensity profile 
along the pink rectangle in d, directly showing 

the 4-atom thickness of the layer (e), and the 
corresponding crystal structures (f, g). a.u., 
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oxidized Co 4-atom-thick layers attain the highest Faradaic efficiency 
for formate production of 90.1% at —0.85 V versus SCE (Fig. 3b and 
Extended Data Fig. 6a, b)*!. H2 evolution, quantified by gas chroma- 
tography, accounts for the remaining ~10% of the charges passed. The 
linear sweep voltammetry curves in N>-saturated 0.1 M Na2SO, solu- 
tion indicate that the ultrathin structure and the presence of surface Co 
oxide also increase the H2O reduction activity of the catalyst system 
(Fig. 3a). A '?CO, labelling experiment involving the same set-up and 
8-h electrolysis yielded a product that generates an obvious ’C NMR 
peak at 168.5 parts per million, attributed to H3COO~, anda ‘H NMR 
doublet, corresponding to the proton coupled to the °C of H'3COO7 
(Extended Data Fig. 6c, d)**, confirming that formate is indeed derived 
from CQ. 

Taking the reduction potential of E=—0.61 V versus SCE for the 
CO2/HCOO~ couple in CO2-saturated 0.1 M Na2SOy solution’, the 
potential of —0.85 V versus SCE used in our experiments corresponds 
to an overpotential of 0.24 V. To the best of our knowledge, achieving 
at such a low overpotential a current density as high as 10.59 mAcm~” 
and 90.1% formate selectivity has not been possible with any of the 
previously reported metal or metal oxide electrodes evaluated under 
comparable conditions”, Intriguingly, when using the partially 
oxidized Co 4-atom-thick layers as the working electrode, CO, reduc- 
tion initiated at —0.68 V versus SCE with a measured Faradaic effi- 
ciency for formate formation of 2.3% (Fig. 3a, b). This corresponds to 
an overpotential of only 0.07 V, comparable to that achieved with highly 
active Pd nanoparticles dispersed on a carbon support”1. 


The remarkably enhanced activity of the ultrathin nanostructured 
catalysts is partly due to their increased electrochemical surface area 
(ECSA; see Methods for details) that provides a larger number of 
catalytically active sites. The fivefold ECSA increase from bulk Co to Co 
4-atom-thick layers is an important contributor to the 26-fold increase 
in catalytic activity (Fig. 3a, c). Interestingly, the partially oxidized Co 
4-atom-thick layers exhibited nearly the same ECSA as the Co 4-atom- 
thick layers, yet also a tenfold higher catalytic activity that must there- 
fore be due to the presence of intrinsically more active sites associated 
with the Co oxide. This conclusion is supported by the relative perfor- 
mance of partially oxidized bulk Co and intact bulk Co (Fig. 3a, c), and 
by the finding that catalytic current densities gradually increase as the 
amount of Co oxide in the Co 4-atom-thick layers increases (Extended 
Data Fig. 7). During 40-h electrocatalysis tests, the partially oxidized 
Co 4-atom-thick layers show negligible decay in current density while 
maintaining a formate Faradaic efficiency of approximately 90% (Fig. 3d 
and Extended Data Fig. 6b). This suggests good stability, which is also 
confirmed by XRD, Raman and TEM characterization before and 
after use (Extended Data Fig. 8). Stability testing for Co 4-atom-thick 
layers and bulk Co also indicates that they do not undergo obvious 
oxidation or corrosion during long-term electrolysis (Extended Data 
Figs 9 and 10). 

Volumetric CO, adsorption measurement, carried out to explore the 
reason for the enhanced electrocatalytic activity observed, reveals that 
the Co 4-atom-thick layer system absorbs more CO, than its bulk coun- 
terpart and that partial oxidation increases CO adsorption capacity 
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Figure 2 | Characterizations for the partially oxidized 
Co 4-atom-thick layers obtained at 220°C for 3 h. 

a, High-resolution TEM image. b, c, Enlarged high- 
resolution TEM images. d, e, The related schematic atomic 
models, clearly showing distinct atomic configuration 
corresponding to hexagonal Co and cubic Co304. 

f-h, Elemental mapping. i, Micro-Raman spectra for 

the products obtained at 220°C for 3h (red line) and 48h 
(black line). 


further (Fig. 4a). This suggests that the change in oxidation state and 
increase in surface area synergistically favour CO? adsorption, the pre- 
requisite first event before further reduction reactions can take place. It 
is commonly accepted that on metal electrodes***”34, the adsorbed 
CO) is initially reduced to the CO2"~ intermediate, which could then 
react further according to: 


CO, (g) + * + CO2* (1) 
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CO,*-* + Ht +e + HCOO* (3) 


HCOO-* — HCOO™ + * (4) 
where the asterisk denotes a catalytically active site and e” is an elec- 
tron. ECSA-corrected Tafel slopes for formate production (see Methods 
for details) catalysed by the partially oxidized bulk Co and by bulk 
Co are both close to 118 mV per decade of current (Fig. 4b), indic- 
ative of the involvement of a rate-limiting le~ transfer from CO) to 
CO," (refs 5-8). 


Figure 3 | Electroreduction of CO, to formate. Data are 
shown for partially oxidized Co 4-atom-thick layers 
(red), Co 4-atom-thick layers (blue), partially oxidized 
bulk Co (violet) and bulk Co (black). a, Linear sweep 
voltammetric curves in a CO>-saturated (solid line) 

and N>-saturated (dashed line) 0.1 M Na2SO, aqueous 
solution. b, Faradaic efficiencies of formate at each given 
potential for 4h. c, Charging current density differences 
Aj plotted against scan rates. d, Chrono-amperometry 
results at the corresponding potentials (in b) with the 
highest Faradaic efficiencies. The error bars in b and c 
represent the standard deviations of five independent 
measurements of the same sample. 
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In contrast, corresponding Tafel slopes of close to 59 mV per decade 
obtained with the partially oxidized Co 4-atomic-layers and Co 4-atomic- 
layers are compatible with a reduction mechanism encompassing a 
fast pre-equilibrium involving le~ transfer to form CO,*~ and a sub- 
sequent slower chemical reaction as the rate-determining step**. If this 
is indeed the case, it appears that Co atoms confined in atomic layers are 
able to facilitate CO, activation by stabilizing the CO,"" intermediate 
more effectively than can be achieved by their bulk counterpart. We 
speculate that the further decrease in Tafel slope from 55 mV per dec- 
ade to 44mV per decade and the lowering of the onset potential from 
0.73 V to 0.68 V upon partial oxidation of the Co 4-atom-thick layers 
(Figs 3b and 4b) might be due to Co oxide facilitating the rate-deter- 
mining chemical reaction, probably the H* transfer step (equation (3)). 
We note, however, that the reaction mechanism remains uncertain and 
that further efforts are needed to gain in-depth understanding of the 
individual steps involved. 

Our synthetic strategy has allowed us to produce a well controlled 
model system to explore the influence of both atomic-scale structure 
and the presence of an oxide on the activity of a metal catalyst. ECSA- 
corrected Tafel plots and Faradaic efficiencies clearly demonstrate that 
Co-based catalysts in the form of 4-atom-thick layers exhibit higher 
intrinsic activity and selectivity for formate production at lower over- 
potentials than the bulk material, and that partial oxidation improves 
the intrinsic activity of the system significantly further. Thus the 
appropriate morphology and oxidation state can transform a material 
considered nearly non-catalytic for CO reduction into a very active 
and robust catalyst, calling for a re-thinking of accepted strategies for 
developing efficient CO, electroreduction catalysts. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Synthesis of partially oxidized Co 4-atom-thick layers. In a typical procedure, 
100 mg Co(acac)3 was added into a solution of 20 ml dimethylformamide, 4ml H,O 
and 1 ml n-butylamine. After vigorous stirring for 15 min, the mixture was trans- 
ferred into a 40-ml Teflon-lined autoclave (before this experiment, the autoclave 
was initially treated with 20 ml n-butylamine at 120°C for 6h, and then treated 
with 35 ml H2O at 120°C for 12h), sealed and heated at 220°C for 3h. The system 
was allowed to cool down to room temperature naturally, and the final product 
was collected by centrifuging the mixture, washed with cyclohexane and absolute 
ethanol (1:4) many times, and then dried in vacuum for further characterization. 
Synthesis of Co 4-atom-thick layers. In a typical procedure, 100 mg Co(acac)3 
was added into a solution of 20 ml dimethylformamide, 4m] HO and 1 ml 
n-butylamine. After vigorous stirring for 15 min, the mixture was transferred into 
a 40-ml Teflon-lined autoclave (before this experiment, the autoclave was initially 
treated with 20 ml n-butylamine at 120°C for 6h, and then treated with 35 ml H,O 
at 120°C for 12h), sealed and heated at 220°C for 48h. The system was allowed 
to cool down to room temperature naturally, and the final product was collected 
by centrifuging the mixture, washed with cyclohexane and absolute ethanol (1:4) 
many times, and then dried in vacuum for further characterization. 

Synthesis of partially oxidized bulk Co. In a typical procedure, 1.0 g CoCl;-6H30 
was added into a solution of 15 ml ethylene glycol and 15 ml ethylenediamine. After 
vigorous stirring for 15 min, the mixture was transferred into a 40-ml Teflon-lined 
autoclave, sealed and heated at 220°C for 12h. The system was allowed to cool 
down to room temperature naturally, and the final product was collected by cen- 
trifuging the mixture, washed with cyclohexane and absolute ethanol (1:4) many 
times, and then dried in vacuum for further characterization. 

Synthesis of bulk Co. In a typical procedure, 1.0 g CoCl:-6H2O was added into a 
solution of 15 ml ethylene glycol and 15 ml ethylenediamine. After vigorous stir- 
ring for 15 min, the mixture was transferred into a 40-ml Teflon-lined autoclave, 
sealed and heated at 220°C for 24h. The system was allowed to cool down to room 
temperature naturally, and the final product was collected by centrifuging the mix- 
ture, washed with cyclohexane and absolute ethanol (1:4) many times, and then 
dried in vacuum for further characterization. 

Characterization. The field emission scanning electron microscope (SEM) 
images were performed by using a FEI Sirion-200 SEM. XRD patterns were 
recorded by using a Philips X Pert Pro Super diffractometer with Cu Ka radiation 
(A= 1.54178 A). Atomic force microscopy in the present work was performed using 
a Veeco DI Nano-scope MultiMode V system. The TEM was carried out on a JEM- 
2100F field emission electron microscope at an acceleration voltage of 200 kV. The 
high-resolution TEM, HAADF-STEM and the corresponding energy dispersive 
spectroscopy mapping analyses were performed on a JEOL JEM-ARM200F TEM/ 
STEM with a spherical aberration corrector. Raman spectra were detected on a 
RenishawRM3000 Micro-Raman system. 

Electrochemical measurements. Electrochemical measurements were carried 
out in a three-electrode system at an electrochemical station (CHI760E). Typically, 


a 10-mg sample and 401] of Nafion solution (5 wt%) were dispersed in 1 ml of 
water-ethanol solution with a volume ratio of 3:1 by sonicating for 1h to forma 
homogeneous ink. Then, 3011 of the dispersion was loaded onto a glassy carbon 
electrode with diameter 12 mm. For CO; reduction experiments, linear sweep 
voltammetry with a scan rate of 20mVs~! was conducted in CO>-saturated 0.1 M 
Na2SO, solution (60 ml, pH ~ 6) (the NaSO, electrolyte was purged with CO2 
for 30 min before the measurement). For comparison, linear sweep voltammetry 
with a scan rate of 20mVs7! was also conducted in N2-saturated 0.1 M Na2SO4 
solution. The glassy carbon electrode served as the working electrode. The counter 
and the reference electrodes were the platinum gauze and the SCE reference elec- 
trode, respectively. The liquid products were quantified by NMR (Bruker AVANCE 
AV III 400) spectroscopy, in which 0.5 ml electrolyte was mixed with 0.1 ml D,O 
(deuterated water) and 0.0511 dimethyl sulfoxide (DMSO, Sigma, 99.99%) was 
added as an internal standard. The one-dimensional 'H spectrum was measured 
with water suppression using a pre-saturation method. The evolved gas products 
were detected using an Agilent Technologies 7890B gas chromatograph. 

ECSA = R;S, in which S stands for the real surface area of the smooth metal elec- 
trode, which was generally equal to the geometric area of glassy carbon electrode 
(in this work, S=1.13 cm~?). The roughness factor R¢ was estimated from the ratio 
of double-layer capacitance Cy for the working electrode and the corresponding 
smooth metal electrode (assuming that the average double-layer capacitance of a 
smooth metal surface is 20;1F cm~*)*, that is, Re= Cq/20 pF cm~”. The Cg was 
determined by measuring the capacitive current associated with double-layer 
charging from the scan-rate dependence of cyclic voltammetric stripping. For 
this, the potential window of cyclic voltammetric stripping was —0.3 V to —0.2 V 
versus SCE (0.1 M NajSO, solution). The scan rates were 20mVs~!,30mVs"}, 
50mVs_!,80mVs_/, 100mVs! and 120mVs~!. The Cy was estimated by plot- 
ting the Aj=(j, —jc) at —0.25 V (where j, and j, are the cathodic and anodic 
current densities, respectively) versus SCE against the scan rate, in which the slope 
was twice that of Cy. ECSA-corrected Tafel slopes for formate production (that 
is, jrotal X formate/ ECSA) were calculated from the corresponding ECSA-corrected 
current densities for formate according to the linear sweep voltammetry curves 
and the formate Faradaic efficiency (formate). The Faradaic efficiency of formate 
was calculated from the total amount of charge Q (in units of coulombs) passed 
through the sample and the total amount of formate produced njormate (in moles). 
Q=Ix t, where I (in amperes) is the reduction current at a specific applied poten- 
tial and t is the time (in seconds) for the constant reduction current. The total 
amount of formate produced was measured using NMR (Bruker AVANCE AV 
III 400) spectroscopy. Assuming that two electrons are needed to produce one 
formate molecule, the Faradaic efficiency can be calculated as follows: Faradaic 
efficiency =2F X nformate/ Q=2F X Nformate/(I x t), where F is the Faraday constant. 


25. Popezyk, M., Serek, A & Budniok, A. Production and properties of composite 
layers based on an Ni-P amorphous matrix. Nanotechnology 14, 341-346 
(2003). 
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Extended Data Figure 1 | Characterizations for the comparable 


products. a, b, SEM image (a) and XRD pattern (b) for Co(OH), sheets. 


c, d, TEM image (c) and XRD pattern (d) for large and irregular Co 
particles. In the case where only n-butylamine was present, the reaction 
produced two-dimensional Co(OH), sheets (a, b), whereas the reaction 
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yielded large and irregular Co particles when only dimethylformamide 
was used (c, d). These results indicated that n-butylamine favoured the 
formation of a sheet-like morphology, while dimethylformamide was 
beneficial in reducing the cobalt ions with high oxidation states. JCDPS, 
the Joint Committee on Powder Diffraction Standards. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


100 nm 


Extended Data Figure 2 | Characterizations for the intermediate products. TEM images for the obtained products at 220°C for 0.5h (a) and 2h (b). 
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Extended Data Figure 3 | Supplementary characterizations for the 
partially oxidized Co 4-atom-thick layers. a, TEM image. b, XRD 
pattern. c, High-resolution TEM image, in which the majority of these 
two-dimensional sheets corresponds to the [001]-oriented hexagonal Co, 
while the other structural domains denoted by red squares correspond to 
the cubic Co30y. 
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Extended Data Figure 4 | Characterizations for the Co 4-atom-thick layers. a, XRD pattern. b, Atomic force microscope image. c, The corresponding 
height profile. Data are shown for the products obtained at 220°C for 48h. 
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Extended Data Figure 5 | Characterizations for partially oxidized bulk Co and bulk Co particles. a, XRD patterns. b, Micro-Raman spectra. c, SEM 
image for partially oxidized bulk Co particles. d, SEM image for bulk Co particles. 
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Extended Data Figure 6 | NMR spectra and formate yield. 
a, Representative NMR spectra of the electrolyte after CO2 reduction 
electrolysis at —0.85 V versus SCE for the partially oxidized Co 4-atom- in b exhibited a variability of <10% for the formate yield. c, d, °C spectra 
thick layers. DMSO is used as an internal standard for quantification of (c) and ‘H-NMR spectra (d) of the electrolyte after 8h ‘CO, reduction 
HCOO .b, Formate yield at the corresponding potentials with the highest _ electrolysis at —0.85 V versus SCE for the partially oxidized Co 4-atom-thick 
Faradaic efficiencies for the partially oxidized Co 4-atom-thick layers, layers. 
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a, XRD patterns. b, Raman spectra. c, Linear sweep voltammetric curves. the synthetic time is increased from 3h to 24h; note that the increased 
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Extended Data Figure 8 | Characterizations for the partially oxidized 
Co 4-atom-thick layers after the 40-h test. a, TEM image for the partially 
oxidized Co 4-atom-thick layers after the 40-h CO) reduction test. 

b, c, XRD patterns (b) and Raman spectra (c) for the partially oxidized 

Co 4-atom-thick layers before and after the 40-h CO> reduction test. 

The samples for the above characterizations were collected as follows: 
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the working electrodes after 40 h of electrolysis were sonicated in ethanol 

for about 3 min and then the samples were collected by centrifuging 

the mixture, washed with cyclohexane and absolute ethanol (1:4) many 

times, and then dried in vacuum. The above process was performed on 

approximately 50 similar working electrodes and all the samples collected 

were used to conduct the above characterizations. 
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Extended Data Figure 9 | XRD patterns and Raman spectra before and after 40-h electrolysis at —0.85 V versus SCE for Co 4-atom-thick layers and 
bulk Co. a, b, XRD patterns (a) and Raman spectra (b) for Co 4-atom-thick layers. c, d, XRD patterns (c) and Raman spectra (d) for bulk Co. 
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Extended Data Figure 10 | XRD patterns and Raman spectra before and after repeating linear sweep voltammetry measurement scanning from 
—0.35 V versus SCE to different potentials (versus SCE) about 300 times. a, b, XRD patterns (a) and Raman spectra (b) for Co 4-atom-thick layers. 
c, d, XRD patterns (c) and Raman spectra (d) for bulk Co. 
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Four-electron deoxygenative reductive coupling of 
carbon monoxide at a single metal site 


Joshua A. Buss! & Theodor Agapie! 


Carbon dioxide is the ultimate source of the fossil fuels that are 
both central to modern life and problematic: their use increases 
atmospheric levels of greenhouse gases, and their availability is 
geopolitically constrained!. Using carbon dioxide as a feedstock 
to produce synthetic fuels might, in principle, alleviate these 
concerns. Although many homogeneous and heterogeneous 
catalysts convert carbon dioxide to carbon monoxide’, further 
deoxygenative coupling of carbon monoxide to generate useful 
multicarbon products is challenging®. Molybdenum and vanadium 
nitrogenases are capable of converting carbon monoxide into 
hydrocarbons under mild conditions, using discrete electron and 
proton sources“. Electrocatalytic reduction of carbon monoxide 
on copper catalysts’ also uses a combination of electrons and 
protons, while the industrial Fischer-Tropsch process uses 
dihydrogen as a combined source of electrons and electrophiles for 
carbon monoxide coupling at high temperatures and pressures®. 
However, these enzymatic and heterogeneous systems are difficult 
to probe mechanistically. Molecular catalysts have been studied 
extensively®”? to investigate the elementary steps by which carbon 
monoxide is deoxygenated and coupled, but a single metal site that 
can efficiently induce the required scission of carbon-oxygen bonds 
and generate carbon-carbon bonds has not yet been documented. 
Here we describe a molybdenum compound, supported by a 
terphenyl-diphosphine ligand, that activates and cleaves the strong 
carbon-oxygen bond of carbon monoxide, enacts carbon-carbon 
coupling, and spontaneously dissociates the resulting fragment. This 
complex four-electron transformation is enabled by the terphenyl- 
diphosphine ligand**”’, which acts as an electron reservoir and 
exhibits the coordinative flexibility needed to stabilize the different 
intermediates involved in the overall reaction sequence. We 
anticipate that these design elements might help in the development 
of efficient catalysts for converting carbon monoxide to chemical 
fuels, and should prove useful in the broader context of performing 
complex multi-electron transformations at a single metal site. 
When using molecular systems to study the conversion of car- 
bon monoxide (CO) into multicarbon products, a common strategy 
invokes the stepwise addition of hydrides and electrophiles to metal- 
carbonyl compounds®. For example, sequential Lewis-acid-assisted 
metal-formyl generation, activation and migratory insertion has been 
demonstrated’, realizing a complex reductive coupling sequence in a 
single step. Under very reducing conditions, highly oxophilic metal 
centres have been shown to reduce, couple and deoxygenate CO, 
processes driven by the strength of the metal-oxygen interaction”"!'. 
A similar approach involves the insertion of CO into early transition 
metal hydride bonds; however, the stability imparted by the oxophilic 
metal centres complicates product release'*~'®. The addition of silyl 
electrophiles to anionic dicarbonyl complexes of group 5 metals 
has been shown to generate 17-bis(siloxyacetylene) compounds’”, 
constructing an acetylene diolate linkage similar to that seen for the 
reduction of CO by alkali metals'®. Examples of CO coupling are also 
known for the later transition metals; an anionic iron-dicarbonyl and 


a rhodium-porphyrin dimer demonstrate formal reduction but not 
deoxygenation of CO moieties!?”°. 

Complete bond-breaking transformations of diatomic small-molecule 
substrates at a single metal centre present a considerable synthetic 
challenge, as four or more electrons are often required (for example, N2 
cleavage requires six electrons; O2, four electrons; CO, six electrons). 
Taking this into consideration, the nature of the envisioned interme- 
diates in a deoxygenative CO coupling scheme varies markedly. Before 
the cleavage of C-O bonds can occur, reduced, electron-rich metal cen- 
tres that are capable of CO activation are necessary. The scission of CO 
at a single transition-metal site in the presence of strong reductants has 
been reported, and leads to the generation of an anionic molybdenum 
carbide?!. Following cleavage, the products and potential coupling 
precursors are more-oxidized complexes bearing metal-ligand multiple 
bonds. The formation of C-C bonds between CO ligands and substi- 
tuted alkylidyne moieties is known, but these carbyne ligands do not 
originate from CO in a direct fashion (refs 22, 23). 

Here, to facilitate access to the broad range of possible intermediates 
for C-O scission, C-C coupling and product release that are required 
to generate metal-free C,O, fragments at a single metal site, we used a 
ligand set with a propensity for diverse binding modes. The terphenyl- 
diphosphine ligand undergoes changes in the coordination mode of 
the central arene as a function of the oxidation state and primary coor- 
dination sphere of the metal**”°. Furthermore, the requirement for 
numerous reducing equivalents can be satisfied by arene reduction” in 
concert with oxidation-state changes at the metal. We anticipated that 
supporting molybdenum (Mo)—a metal known to enact both C-O 
cleavage”! and C-C coupling!’—on such a versatile ligand platform 
might provide access to the diverse intermediates necessary to perform 
mononuclear CO deoxygenation and coupling. 

The Mo"—dicarbonyl complex (compound 1; reported in ref. 24) can 
be reduced in a stepwise fashion with potassium-intercalated graphite, 
KCs (or with potassium naphthalenide, K(CjoHs)), to generate formal 
Mo? (2), Mo~" (3) and Mo" (4) species (Fig. 1). These complexes, 
spanning six oxidation states, have all been fully characterized by single 
crystal X-ray diffraction (XRD). The change in ligand-binding mode 
in the solid state is notable (Fig. 2). Dication 1 displays the coordina- 
tion of both phosphine arms, as well as an no—-arene interaction; in 
compound 2, one phosphine is dissociated. Further reduction leads 
to the formation of potassium-ion-bridged polynuclear clusters, dis- 
playing interactions between the cations and the free phosphine arm, 
the carbonyl ligands, and the ligand t-systems. In these oligomers, 
the central arene is distorted considerably from planarity, and the 
molybdenum-arene binding mode changes from 7° in compound 2 
to 7‘ in compound 3, and to 7 or 7 in the two unique subunits of 4. 
These differences, together with the specific localization of C-C single 
and double bond character, are consistent with substantial delocaliza- 
tion of reducing equivalents into the terphenyl scaffold. The ability of 
the pendant arene to act as a reservoir for electrons, adopting a partial 
cyclohexyldienyldianion character”, is essential to supporting these 


uncommon highly reduced species””. 
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Figure 1 | Deoxygenative coupling of CO to produce a C,0, fragment. 
The overall reaction (shown at the top) involves the transformation of two 
Mo-bound carbonyls, with the addition of four reducing equivalents (e7 ) 
and four equivalents of electrophile (E*), to generate a metal-free C,O, 
product. A detailed scheme follows. Starting with compound 1, successive 
electron loading (to 2, then 3, then 4)—using KCs and facilitated by 
electron storage in the pendant arene—leads to substantial CO activation. 
The addition of the silyl electrophile Me3SiCl to 3 results in C-O cleavage 
and the formation of silylcarbyne 7, proposed to proceed via a terminal 
molybdenum carbide, 8. From 7, two electrons are required for the 
formation of C, products (6b and 6c), which are spontaneously displaced 
by No, providing compound 5. Addition of bulkier silyl electrophiles 
(i-Pr3SiCl) to 3 or the more-reduced 4 results directly in the generation 
of compound 5 and a C, organic fragment (6a). Synthesis of these C,0, 
products (6a, 6b and 6c) from two CO ligands represents an overall four- 
electron transformation. E*, electrophile; e~, electron; Mo, molybdenum; 
OTF, trifluoromethanesulfonate. 


The elongation of C-O and contraction of Mo-CO bond lengths 
in the solid state (Supplementary Table 2) is consistent with appre- 
ciable back-bonding upon reduction; the delocalization of elec- 
tron density into the carbonyl ligands likewise plays a pivotal role 
in stabilizing such reduced compounds. Attenuation of the CO 
stretching frequencies across the reduction series (for compound 
1, 2,044 cm~! and 1,989 cm; 2, 1,887cm™! and 1,832 cm™}; 
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3, 1,657cm~! and 1,570cm~}; 4, less than 1,500cm~!) further sup- 
ports this claim; each set of stretches shifts as anticipated for the °C 
isotopologues. Given this strong CO activation, we explored electro- 
philic functionalization. Under an N2 atmosphere, treating compound 4 
with excess triisopropylsilyl chloride (i-Pr3SiCl) leads to the formation 
of N>-ligated complex 5 (ref. 24) as the major metal-containing product 
(75% yield, as determined by *'P{'H} nuclear magnetic resonance 
(NMR) spectroscopy); partial oxidation, producing compound 2, 
also occurs. Comparison with authentic samples confirmed hexa-iso- 
propyldisilyloxane ((i-Pr3Si),O; by gas chromatography/mass 
spectrometry) and the C20, fragment 6a (by °C{'H} NMR 
spectroscopy) as the organic by-products generated in the 
transformation of 4 to compound 5 (Fig. 1); '°C labelling confirmed that 
the C, unit originated from the carbonyl ligands. This remarkable reac- 
tion encompasses a trianionic metal complex that displays a high degree 
of CO activation, cleavage of the strong C-O bond, formation of a C=C 
bond, and spontaneous product release. These features distinguish this 
system from reported examples”*!!"!” of transition-metal-mediated CO 
coupling, which lead to two electron-reduced products or prove resilient 
to dissociation of the CO cleavage fragments from the metal centre. 

Balancing this reaction, however, reveals that an additional reducing 
equivalent is required, beyond the three that are stored in complex 4. 
The generation of compound 2 suggests that trianion 4 may be act- 
ing as a sacrificial reductant. Consistent with this hypothesis, adding 
i-Pr3SiCl to the less-reduced compound 3 (Fig. 1) lowers the yield 
of coupling products (to 50%, as determined by *'P{'H} NMR spec- 
troscopy) and increases the oxidation that generates compound 2. 
Performing the reaction with an additional electron equivalent— 
in the form of K(Cj9Hg)—leads to greater than 90% conversion of 
4 to compound 5 (as measured by NMR spectroscopy), with no 
observation of the oxidation product, 2. Given this high yield, we 
targeted a closed synthetic cycle; however, addition of CO gas to 5 
does not regenerate 2. 

Anticipating that a smaller silyl electrophile might facilitate 
the observation of reaction intermediates, we treated a thawing 
tetrahydrofuran solution of dianion 3 with excess trimethylsilyl chloride 
(Me3SiCl). The silylcarbyne complex 7 was formed cleanly (Fig. 1), 
as suggested by an alkylidyne resonance at 355.85 parts per million 
(p.p.m.) in the '9C{'H} NMR spectrum, and confirmed by single- 
crystal XRD (Fig. 2). To support the higher oxidation state of molyb- 
denum in 7, the ligand adopts a pseudo-square pyramidal coor- 
dination environment, with no notable metal—arene interaction. 
Hexamethyldisiloxane ((Me3Si)2O) was detected in the reaction 
mixture by '°C{'H} NMR spectroscopy, accounting for the cleaved 
oxygen atom. The formation of 7 from 3 represents a complete scission 
of the C=O bond—a six-electron transformation. This conversion is 
accomplished in high yields, without additional reducing equivalents, 
owing to the ability of the supporting ligand to facilitate the storage of 
multiple electrons. 

The formation of deoxygenated compound 7 suggests the 
intermediacy of a terminal molybdenum carbide complex, 8, 
generated after loss of silyl ether. To probe the viability of this 
proposal, we targeted 8 by independent synthesis (Fig. 1). At low 
temperatures (—20°C), the desilylation of 13C-labelled 7 with 
tetra-n-butylammonium fluoride (n-BugNF) results in clean con- 
version to a new complex with a resonance in the *C{‘H} NMR 
spectrum at 546.20 p.p.m. (Fig. 3). This chemical shift is diag- 
nostic of terminal carbide species”!?8*°, and indicates that such 
a moiety is accessible in our system, even though examples of 
neutral terminal transition metal carbides are exceedingly rare*””, 
and unprecedented for molybdenum. The coupling pattern 
observed in the '°C{'H} NMR spectrum is consistent with isotop- 
ically enriched carbide (546.20 p.p.m.) and CO (233.16 p.p.m.) lig- 
ands both being coordinated to the same metal centre. Although 
compound 8 decomposes to a complex mixture at room tempera- 
ture, treatment with Me;3SiCl at —80°C leads to the clean formation 
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Figure 2 | X-ray crystal structures of compounds 2, 3, 4 and 7. Structures 
displaying a full molybdenum terphenyl-diphosphine unit are shown 

at the top; truncated enlargements of the molybdenum -arene cores 

of complexes 2 and 3 are shown in the inset. Reduction of compound 

2 to generate 3 and 4 leads to deplanarization of the arene, consistent 

with a partial cyclohexyldienyldianion character. In the more-oxidized 


of 7 upon warming, showing that the carbide could be an intermediate 
en route to 7 from 3. 

Recognizing that the silylcarbyne 7 incorporates both C-Si and C-O 
linkages, we explored the formation of C-C bonds from this species. 
Treating 7 with two equivalents of KCs under N; leads to the quanti- 
tative formation of compound 5 and an unbound C,0; fragment (6b; 


compound 7, no metal-arene interaction is observed. The molecular 
structures are displayed with anisotropic displacement ellipsoids shown at 
the 50% probability level. Co-crystallized solvent molecules, potassium- 
bound tetrahydrofuran molecules, and hydrogen atoms are omitted for 
clarity. A single molybdenum core is represented for the polynuclear 
clusters 3 and 4. 


Fig. 1). Subsequent addition of Me;SiCl produces the bis(trimethylsilyl) 
ketene 6c. This reaction sequence demonstrates the ability of 7 to 
undergo reductive C-C coupling followed by dissociation of the par- 
tially deoxygenated C, unit from molybdenum. An additional reduc- 
ing equivalent is pre-loaded in the system when comparing trianion 4 
to dianion 3, resulting directly in coupling chemistry from the former. 
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Figure 3 | NMR spectroscopic data. These 'C{'H} NMR spectroscopic 
data (126 MHz, 25°C) are for a solution of compound 8, bearing 
3C-labels on the CO-derived carbon atoms, dissolved in tetrahydrofuran/ 
benzene-d°. The coupling pattern (7J(C, C) = 3.46 Hz; *J(P, C) = 3.26 Hz; 
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27(P, CO) = 12.55 Hz) and the chemical shifts of the isotopically 
enriched carbon atoms are consistent with the coordination of carbide 
(546.20 p.p.m.) and CO ligands (233.16 p.p.m.) to the same metal centre. 
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Our work establishes a series of design elements to be considered 
for monometallic complexes capable of the conversion of CO to C,0; 
products. Specifically, a supporting ligand that can store reducing 
equivalents in a pendant arene facilitates the activation and cleavage 
of the C-O bond, generating an intermediate molybdenum carbide. 
Subsequent C-C coupling is facilitated by silylation and reduction. This 
combination of bond breaking, bond forming, and organic-product 
release is unprecedented for a homogeneous monometallic system. 
Moreover, although metal carbides are commonly invoked intermedi- 
ates for heterogeneous CO catenation, never before has a well-defined, 
terminal metal carbide complex been demonstrated as a CO-coupling 
intermediate. The electronically and coordinatively flexible ligand 
architecture, with variable arene-binding modes and a hemilabile phos- 
phine donor, is instrumental in supporting these diverse moieties, facil- 
itating the activation, scission, and coupling of CO to generate C,0; 
species. The transformations observed here provide precedent relevant 
to the conversion chemistry of oxygenated C, feedstocks to partially 
deoxygenated C, products, showcasing multi-electron reactions at a 
single metal site. 
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Slab melting as a barrier to deep carbon subduction 


Andrew R. Thomson!?, Michael J. Walter!, Simon C. Kohn! & Richard A. Brooker! 


Interactions between crustal and mantle reservoirs dominate 
the surface inventory of volatile elements over geological time, 
moderating atmospheric composition and maintaining a life- 
supporting planet!. While volcanoes expel volatile components 
into surface reservoirs, subduction of oceanic crust is responsible 
for replenishment of mantle reservoirs”*. Many natural, ‘superdeep’ 
diamonds originating in the deep upper mantle and transition zone 
host mineral inclusions, indicating an affinity to subducted oceanic 
crust*”. Here we show that the majority of slab geotherms will 
intersect a deep depression along the melting curve of carbonated 
oceanic crust at depths of approximately 300 to 700 kilometres, 
creating a barrier to direct carbonate recycling into the deep mantle. 
Low-degree partial melts are alkaline carbonatites that are highly 
reactive with reduced ambient mantle, producing diamond. Many 
inclusions in superdeep diamonds are best explained by carbonate 
melt-peridotite reaction. A deep carbon barrier may dominate the 
recycling of carbon in the mantle and contribute to chemical and 
isotopic heterogeneity of the mantle reservoir. 

Altered oceanic crust incorporates appreciable carbon, which is 
added by magmatic and hydrothermal processes’, and by addition of 
CO, during interaction of basalt with sea water”. Together, these altera- 
tion processes result in subducting lithosphere that contains an average 
of ~2 weight per cent (wt%) CO in the uppermost volcanic section 
and 100-5,000 p.p.m. CO, throughout the remaining 7 km of crust’. 
Crustal carbon initially contains a mixture of reduced hydrocarbons® 
and oxidized carbonates’. However, metamorphic re-equilibration of 
slab carbon with ferric iron and/or oxidizing fluids produced during 
serpentine dehydration at sub-arc conditions probably converts most 
slab carbon to carbonate!”. Some of this carbon is returned to the 
exosphere in volcanic arcs, but both theoretical’! and experimental’ 
studies suggest that a considerable quantity of carbon may survive 
beyond slab dehydration, and be subducted into the mantle. 

Carbon is insoluble in mantle silicate minerals! and is stored either as 
carbonate, carbide or diamond, depending on the oxidation state. Under 
oxidizing conditions, carbonate lowers the melting point (solidus) 
of mantle peridotite by up to 500°C compared with volatile-free 
mantle'*. However, at the more reducing conditions prevailing deeper 
in the upper mantle and transition zone, carbon will be stored as dia- 
mond or carbide minerals!°, where it does not appreciably influence 
melting. 

Superdeep diamonds originate from depths beneath the lithospheric 
mantle (2200 km) and are the only direct samples of the deep mantle 
carbon reservoir. Inclusions in these diamonds are dominated by 
upper mantle and transition zone minerals, which are mostly asso- 
ciated with subducted mafic lithologies rather than peridotite*”!®. 
Many superdeep diamonds are made of isotopically light carbon®” 
and, where measured, their inclusions contain isotopically heavy 
oxygen!’, unambiguously indicating an origin from recycled sur- 
face material®”!’. The elevated trace element abundances of many 
silicate inclusions suggest crystallization from a low-degree melt, 
thought to be generated from melting of subducted oceanic crust”. 
Here we examine the fate of subducting carbonated mid-ocean-ridge 
basalt (MORB) as it reaches the transition zone, and the potential for 


melt-mantle reactions to reproduce superdeep diamonds and their 
distinctive inclusion assemblages. 

Previous experimental studies have investigated the melting behav- 
iour of carbonated basalt at elevated pressures, but only one extends 
beyond 10 GPa (ref. 19). These studies show a remarkable diversity 
in melting behaviour, making extrapolation to higher pressures diffi- 
cult. In addition, the bulk compositions employed in previous studies 
often contain considerably more CO, than mean oceanic crust, and fall 
outside the compositional field of natural MORB rocks (see Methods, 
Extended Data Fig. 1 and Extended Data Table 1). To understand 
better the melting behaviour of deeply subducted oceanic crust, we 
determined the melting phase relations of a synthetic MORB com- 
position containing 2.5 wt% CO, between 3 and 21 GPa (Methods). 
Our starting composition replicates the major element composition of 
basaltic rocks from International Ocean Discovery Program (IODP) 
hole 1256D” and falls within the range of natural crust compositions”! 
(Extended Data Fig. 1). 

We observe subsolidus phase assemblages containing garnet, 
clinopyroxene, a SiO2 polymorph, and Ti-rich oxide at all pressures 
(Extended Data Figs 2, 3 and Extended Data Table 2). The carbon 
component was either CO, dolomite, magnesite or magnesite plus 
Na-carbonate depending on the pressure, and the positions of solid 
carbonate phase boundaries are consistent with previous studies””*. 
Near-solidus partial melts are CO2-bearing silicate melts below 7 GPa, 
and silica-poor calcic carbonatites above 7 GPa. The alkali component 
of carbonatite melts increases with pressure (Extended Data Fig. 4), 
and all melts have high TiO2/SiO2 (see Methods and Extended Data 
Figs 2-5, Extended Data Table 2 and Supplementary Tables 1-4 for 
detailed results). 

The melting temperature of carbonated oceanic crust is tightly 
bracketed from ~3 to 21 GPa (Fig. 1). Melting temperatures 
increase steadily with increasing pressure until about 13 GPa, when 
the solidus dramatically drops over a narrow pressure interval by 
~200°C. This drop in solidus temperature is caused by a change 
in clinopyroxene composition towards a more sodium-rich com- 
position above 13 GPa due to dissolution of sodium-poor pyroxene 
components into coexisting garnet. Eventually, clinopyroxene 
becomes so sodium-rich that a coexisting Na-carbonate mineral 
([Nao.97Ko.03]0.33[Cao.s6Mgo.11Feo.03]0.67CO3) stabilizes in the subsolidus 
assemblage, causing the depression along the solidus. The loss of the 
sodium-poor clinopyroxene component, and the extended stability of 
sodic clinopyroxene in the absence of an alternative sodium-bearing 
silicate phase, is consistent with previous studies”*. Above 16 GPa 
the solidus changes little with pressure, remaining at ~1,150°C, con- 
sistent with the solidus observed in a sodium-rich simplified system 
in which sodic carbonate ([Na,K]o33Cao.67CO3) controls melting 
temperatures”*. The major difference between this work and the pre- 
vious study of carbonated MORB above 8 GPa (ref. 19) is the different 
phase assemblage resulting from the lower and more realistic CO2 and 
CaO contents of our bulk composition. Previous bulk compositions 
with higher CaO contents (Extended Data Figs 1 and 5) are located 
on the calcium-rich side of the majorite-clinopyroxene tie-line and 
stabilize aragonite as the carbon-hosting phase, which can incorporate 
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Figure 1 | The melting curve of carbonated MORB compared to hot 

and cold subduction geotherms”*. The stability fields of carbon-bearing 
phases are identified in different colours. Experiments performed are 
marked by filled triangles indicating their relationship to the solidus, larger 
symbols mark solidus brackets. The solidus ledge creates a narrow depth 
interval where slab temperatures intersect the melting curve, producing a 
focused region of melt generation at the top of the transition zone. 


considerable Na2O. The lower CO; content in our bulk composition 
results in a smaller proportion of carbonate, of which the dominant 
species is sodium-poor magnesite. Thus, sodic clinopyroxene remains 
stable as an alkali-host, coexisting with stoichiometric Na-carbonate 
to high pressures. 

The deep solidus depression in carbonated oceanic crust at upper- 
most transition zone conditions creates a key control on the recycling 
of mantle carbon. Extrapolation of the range of modern-day oceanic 
crustal geotherms into the transition zone” reveals that the majority of 
slabs will intersect our solidus for carbonated recycled MORB (Fig. 1), 
producing carbonatite melt. Given the expected temperature profile 
in the average subducted slab”° we estimate that melting would occur 
to depths of at least 7 km into the crustal section. Only the coldest 
modern-day slabs escape the solidus depression and are able to carry 
their carbonate cargo beyond the transition zone. If ancient slabs were 
hotter’, it seems likely that carbonate subduction through the tran- 
sition zone and into the lower mantle has been limited throughout 
Earth’s history. While the natural variability of subducting slabs (for 
example, composition, age, temperature) will have created some range 
in melting behaviour, the depression of the carbonated eclogite solidus 
will remain an efficient barrier. Thus, direct recycling of carbon into 
the lower mantle may have been highly restricted throughout most of 
the Earth’s history, instead being redistributed throughout the upper 
mantle. 

Carbonatitic melts are predicted to be mobile at mantle conditions 
due to their low viscosity and ability to wet silicate minerals”’, so 
should percolate out of the slab and infiltrate the overlying perid- 
otitic mantle?>. Experiments suggest that below ~250 km, ambient 
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mantle oxygen fugacity is reducing, and a free metal phase may be 
present in the mantle”®. Under such conditions carbonate melt is 
unstable and will reduce to diamond plus oxygen by a ‘redox-freezing’ 
reaction”® such as: MgCO3 + 2Fe® = 3(Mgp 33, Fe7*9.67)O +C. Thus, the 
expulsion of carbonatite melts due to melting of oceanic crust along 
the solidus depression provides an ideal environment for diamond 
growth across a depth interval of ~300-700 km. We predict that the 
interaction between MORB-derived carbonatite melt and ambient 
peridotite is capable of reproducing many of the characteristics of 
superdeep diamonds and the mineral inclusions that they capture from 
this depth interval*°. The most common silicate minerals identified 
in superdeep diamonds are majorite garnet, and a titanium-bearing, 
calcium-silicate phase commonly interpreted as retrogressed ‘calcium 
perovskite’*®78, Barometric estimates of the crystallization pressures 
for these majorite inclusions indicate they crystallized between 10 and 
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Figure 2 | Composition of majoritic garnet minerals from previous 
experimental studies, inclusions in diamonds and reaction 
experiments. a—c, The red field outlines the approximate range of 
peridotitic majorite compositions, the blue field outlines the range of 
MORB majorites from pressures above the carbonated MORB solidus 
ledge (29 GPa). Na (per formula unit (pfu)) plotted against Mg number 
(Mg number = Mg/[Mg+Fe]) (a), Ca number (Ca number = Ca/ 
[Ca+Mg-+Fe]) (b), and Ti (pfu) (c). Data and corresponding references for 
this figure are provided in the online source data file. 
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16 GPa (ref. 5), and inclusions of calcium perovskite are constrained by 
their chemistry to have formed between ~10 and 20 GPa (refs 6, 18). 
These pressures are remarkably consistent with the range of pres- 
sures at which slab crustal geotherms are predicted to intersect the 
carbonated solidus depression (Fig. 1). 

Redox reactions in the mantle are complex and involve silicates, 
many containing iron that exists in both ferrous (Fe?*) and ferric 
form (Fe**). To test the melt-mantle interaction model, we recreated 
the infiltration process in a second set of experiments by partially 
equilibrating a model slab melt with an iron-metal-bearing transition 
zone peridotite assemblage at 20 GPa (see Methods for details). We 
observe a reaction zone between the alkaline carbonatite melt and the 
initial peridotitic assemblage of majorite, wadsleyite, calcium-silicate 
perovskite and iron metal that consists of sodium-rich majoritic gar- 
net, Ca[Si, Ti]O3 perovskite, ferrous ringwoodite (Mg number ~75), 
ferropericlase (Mg number ~0.4) and diamond (Extended Data Figs 6, 7 
and Extended Data Table 3). We compare the resulting mineral com- 
positions with previous experimental data for peridotite and MORB 
systems to investigate whether natural inclusion assemblages might 
preserve a record of mineral-melt reactions. 

The compositions of the majority of superdeep majoritic garnet 
inclusions are not typical of those expected in either peridotitic or 
eclogitic bulk compositions (Fig. 2) and instead lie between these 
two end-members. These intermediate compositions have previ- 
ously been described as pyroxenitic, and it was suggested that the 
transition zone may harbour a large component of this rock type’®. 
Our results suggest an alternative explanation. In Fig. 2 the majoritic 
garnets produced during the experimental melt-mantle interaction 
are intermediate between peridotitic and eclogitic compositions, and 
cover much of the range seen in the diamond inclusions. The chem- 
ical imprint imparted by the MORB carbonatite on the peridotitic 
mantle is recorded in the inclusions as elevated Ca number, Na and 
Ti contents alongside depleted Mg number. Our experiments only 
demonstrate the composition of garnets produced near the beginning 
of the melt-mantle interaction sequence, and we suggest that the 
intermediate character of the natural inclusions records a snapshot 
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Figure 3 | Composition of ferropericlase minerals from previous 
experimental studies, inclusions in diamonds and reaction 
experiments. a, b, Blue arrows indicate the compositional evolution 
expected as melt-mantle interactions progress. Data and corresponding 
references for this figure are provided in the online source data file. 


of the infiltration and reaction of slab-derived carbonatite melt with 
peridotite. 

Experimental calcium perovskites have high titanium (~40-60 mol% 
CaTiO3) and are essentially magnesium-free, features observed 
throughout the global range of ‘calcium perovskite’ inclusions 
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Figure 4 | Schematic of the deep mantle carbon cycle. Arrows 
represent paths and estimates of the relative magnitudes of carbon 
fluxes. Downwelling slabs dehydrate at sub-arc depths but retain the 
majority of their carbon cargo. Upon reaching the transition zone they 
produce carbonatite melts (this study) along the solidus ledge that 
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infiltrate’* and react with the overlying mantle (this study). This causes 
diamond production, refertilization and associated metasomatism of the 
surrounding mantle. The melting of recycled crust in the transition zone 
essentially prevents carbon transport into the lower mantle (LM). 
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(Extended Data Fig. 8). Thus, our reaction experiments reproduce the 
unique characteristics of diamond-hosted ‘calcium perovskite’ inclu- 
sions. Crystallization by reaction between a low-degree carbonated 
melt and peridotite is also consistent with the extremely elevated trace 
element contents of diamond-hosted ‘calcium perovskites’ inclusions”. 

Probably the most abundant inclusions in superdeep diamonds are 
magnesium-iron oxide ([Mg,Fe]O), which are often interpreted to 
indicate diamond growth in the lower mantle*. However, our exper- 
iments demonstrate that ferropericlase can be produced in reactions 
between carbonatitic melt and reduced mantle peridotite at upper 
mantle pressures rather than requiring a lower mantle origin”. Figure 3 
demonstrates that natural ferropericlase inclusions are almost all 
iron-rich relative to ferropericlase expected in mantle peridotite, and 
their compositions form arrays towards higher NiO and lower Na,O 
with increasing magnesium number. Our experimental ferropericlase 
compositions lie at the end of the arrays and are iron-rich because the 
peridotite starting material was initially iron-saturated. We suggest 
that, like the majorite inclusions, the array of intermediate ferroperi- 
clase compositions record the progressive reaction of carbonatite melt 
and ambient mantle. 

The melting-phase relations of recycled oceanic crust suggest that 
slabs should undergo melting and loss of carbonate components in the 
transition zone (Fig. 4), a process that has considerable implications for 
the deep carbon cycle. The compositions of diamond-hosted inclusions 
provide strong evidence of this process and confirm that carbon must 
survive subduction beyond sub-arc dehydration reactions. We predict 
that carbon is rarely transported beyond the transition zone and instead 
refertilizes the upper mantle as diamond. Oxidation of diamond-bearing 
mantle upon upwelling can lead to redox melting!* beneath the 
lithosphere and contribute markedly to the generation and geochemical 
signature of surface lavas. This process also probably contributes to the 
formation of distinctive chemical and isotopic reservoirs in the mantle*’. 
Superdeep diamonds provide a physical record of carbon recycling 
above subducting slabs, which can be used to infer the residence time 
of carbon in the mantle. This residence time is regulated by rates of 
subduction, convective mantle upwelling and melting beneath the 
lithosphere, and could occur over a range of timescales, perhaps as short 
as tens to hundreds of millions of years, suggesting the mantle carbon 
cycle can be considerably more vigorous than previously estimated”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Starting materials. The starting material for experiments to determine the 
melting-phase relations of carbonated MORB (ATCM1) replicates basalts from 
the IODP 1256D from the Eastern Pacific Rise” (the reported composition of 
IODP 1256D basalts is the average of all analyses presented in table T17 of ref. 20) 
with an added 2.5 wt% CO, (Extended Data Table 1). This material was formed by 
mixing high-purity SiOz, TiO, Al,O3, FeO, MnO, MgO, Ca3(PO4)2 and CaCO3, 
which were fired overnight at temperatures of 400-1,000 °C, of appropriate 
weights in an agate mortar under ethanol. This mixture was decarbonated and 
fused into a crystal-free glass in a one-atmosphere tube furnace by incrementally 
increasing the temperature from 400 to 1,500°C before drop quenching into water. 
Subsequently weighed amounts of CaCO3, NaxCO3 and K3CO3 were ground into 
the glass, introducing the alkali and CO; components. After creation, the starting 
material was stored at 120°C to avoid absorption of atmospheric water. Starting 
material ATCM2 replicates the near-solidus melt composition measured in melt- 
ing experiments at 20.7 GPa and 1,400/1,480°C. This was created by grinding nat- 
ural magnesite and synthetic siderite with high-purity CaCO3, NaxCO3, K,COs, 
SiO», TiO, Al,O3 and Ca3(PO,)>. Synthetic siderite was created in a cold-seal 
pressure vessel experiment run at 2 kbar and 375°C for 7 days. A double Au cap- 
sule design containing iron (II) oxalate dehydrate in the inner and a 1:1 mixture 
of CaCO; and SiO, in the outer capsule produced a pale beige powder confirmed 
as siderite using Raman spectroscopy. The material for a sandwich experiment, to 
ensure near-solidus melt compositions were accurately determined at 20.7 GPa, 
was formed of a 3:1 mixture of ATCM1:ATCM2. 

The transition-zone peridotite mineral assemblage in reaction experiments 
was synthesized at 20.7 GPa and 1,600°C for 8h from a mixture of KR4003 nat- 
ural peridotite*! with an added 2.5 wt% Fe metal. In reaction runs the recovered 
synthetic peridotite was loaded in a second capsule, surrounded by the ATCM2 
near-solidus melt composition. Additional reaction-type experiments were per- 
formed on ground mixtures of peridotite and melt compositions. In these exper- 
iments PM1 pyrolite*? was used as the peridotite component and mixed with 
ATCM2 melt in 9:1, 7:3 and 1:1 weight ratios in Fe capsules. A single mixed exper- 
iment was performed in a Au capsule and used a starting mix of PM1:Fe:ATCM2 
in 16:1:4 molar ratio. 

Experimental techniques. High-pressure experiments were performed using a 
combination of end-loaded piston cylinder (3 GPa) and Walker-type multi anvil 
(5-21 GPa) experiments at the University of Bristol. Piston cylinder experiments 
employed a NaCl-pyrex assembly with a straight graphite furnace and Al,O; inner 
parts. Temperature was measured using type D thermocouple wires contained 
in an alumina sleeve and positioned immediately adjacent to the AugoPd29 sam- 
ple capsule that contained the powdered starting material. We assume that the 
temperature gradient across the entire capsule (<2 mm) was smaller than 20°C 
(refs 33, 34). The hot piston-in technique was used with a friction correction of 
3% applied to the theoretical oil pressure to achieve the desired run conditions™. 

Multi-anvil experiments were performed using Toshiba F-grade tungsten 
carbide cubes bearing 11, 8 or 4mm truncated corners in combination with a 
pre-fabricated Cr-doped MgO octahedron of 18, 14 or 10mm edge length, respec- 
tively. The relationship between oil-reservoir and sample pressure for each cell 
was calibrated at room and high temperature (1,200 °C) by detecting appropriate 
room temperature phase transitions of Bi, ZnTe and GaAs and bracketing trans- 
formations of SiO, (quartz-coesite and coesite-stishovite), MgoSiO4 (a-8 and 
B-7y) and CaGeO; (garnet-perovskite). Calibrations are estimated to be accurate 
within +1 GPa. In all experiments, desired run pressure was achieved using a slow, 
Eurotherm controlled, pressure ramp of <50 tonnes per hour. Experiments were 
heated after high pressure was reached with high temperatures generated using 
stepped graphite (18/11 cell) or straight LaCrO; furnaces (14/8 and 10/4 cells) and 
monitored with type C thermocouple wires. Two 10/4 experiments, performed 
during a period of repeated LaCrO; heater failures, used rolled 40-j1m-thick Re 
furnaces. Temperature was quenched by turning off the furnace power before a 
slow decompression ramp (half the rate of experiment compression) to ambient 
conditions. Samples were contained in Au capsules unless temperatures exceeded 
its thermal stability, in which case AugoPd29 or Au7sPd5 capsules were used. Run 
durations all exceeded 600 min and are reported in Extended Data Tables 2 and 3. 
Temperature uncertainties were believed to be less than +20, 30 or 50°C for 18/11, 
14/8 and 10/4 cells respectively***”. 

Recovered samples were mounted longitudinally in epoxy, polished under oil 
and repeatedly re-impregnated with a low viscosity epoxy (Buelher EpoHeat) 
to preserve soft and water-soluble alkali carbonate components present in run 
products. 

Analytical techniques. Polished and carbon-coated run products were imaged 
in backscatter electron mode (BSE) using a Hitachi S-3500N scanning electron 
microscope (SEM) with an EDAX Genesis energy dispersive spectrometer to 


identify stable phases and observe product textures. Subsequently, wavelength 
dispersive spectroscopy (WDS) was performed using the Cameca SX100 
Electron Microprobe or the Field Emission Gun Jeol JKA8530F Hyperprobe at 
the University of Bristol to achieve high-precision chemical analyses of run prod- 
ucts. Analyses were performed using an accelerating voltage of 15 or 12kV on 
the respective instruments, with a beam current of 10nA. Calibrations were per- 
formed during each session using a range of natural mineral and metal standards 
and were verified by analysing secondary standards (as described previously’). 
Silicate phases were measured using a focused electron beam whereas carbonates 
and melts were analysed using an incident beam defocused up to a maximum 
size of 101m. Count times for Na and K were limited to 10s on peak and 5s on 
positive and negative background positions. Peak count times for other elements 
were 20-40s. Additional analyses of the calcium perovskite phases grown during 
reaction experiments, measuring only SiO, and MgO content, were made using 
the Jeol instrument at 5kV and 10nA to ensure reported MgO contents were not 
influenced by secondary fluorescence from surrounding material. 

The identity of experimental-produced minerals was determined using Raman 

spectroscopy as a fingerprint technique. Spectra were collected using a Thermo 
Scientific DXRxi Raman microscope equipped with an excitation laser of either 
455 or 532nm. 
Choice of bulk composition and comparison with previous studies. Studies 
that investigate the alteration of oceanic crust have demonstrated that carbon 
incorporation does not simply occur by the addition of a single carbonate species 
to MORB’. It instead appears to occur by a complex amalgamation of hydrocarbon 
and graphite deposition related to hydrothermal fluxing above magma chambers 
at the mid-ocean ridge* and underwater weathering”**° where seawater-derived 
CO, reacts with leached crustal cations, often in veins. It is believed that the 
quantity of biotic organic carbon in the crustal assemblage is negligible compared 
with abiotic organic compounds and inorganic carbonates®. These processes result 
in a layered crustal assemblage that, in the uppermost few hundred metres can 
contain up to a maximum of 4wt% CO, in rare cases”*? but more commonly 
<2 wt% COs (refs 8, 9, 39). Beneath 500 m depth the carbon content drops to 
between 100 and 5,000 p.p.m. CO, throughout the remainder of the 7-km-thick 
basaltic section’, and is mostly organic hydrocarbon species. The upper 300 m are 
regularly altered and can be generally thought to have compositions similar to the 
altered MORB rocks analysed previously*!. Deeper portions of the MORB crust 
retain their pristine MORB compositions. It is therefore apparent that carbonated 
eclogite bulk compositions used in previous studies, where at least 4.4 wt% CO 
was added to an eclogite by addition of ~10 wt% carbonate minerals, may not 
be good analogues of naturally subducting crustal sections. The compositions of 
these starting materials from previous studies'**?-*° can be found in Extended 
Data Table 1. We do not include the composition of the starting material used by 
refs 47 or 48 as these studies were conducted in simplified chemical systems so 
are not directly comparable with these natural system compositions. 

However, as some of the previous studies rightly identify and discuss, the com- 
position of deeply subducted MORB is unlikely to be the same as that entering 
the subduction system. One process widely believed to alter the composition of 
downwelling MORB is sub-arc slab dehydration. Pressure (P)-temperature (T) 
paths of subducted slabs”* can be compared with experimental studies of hydrous, 
carbonated and H,O-CO,-bearing eclogite compositions'****9? and thermo- 
dynamic models!!*° to conclude that slabs experience dehydration at sub-arc 
conditions (that is, 1-5 GPa) but will generally not reach high enough temper- 
atures to undergo melting. Therefore, they will by and large retain their carbon 
components although some fraction may be lost by dissolution into aqueous 
fluids®!>. It is believed that sub-arc dehydration is capable of removing SiO, 
from the subducting assemblage, and previous carbonated MORB compositions 
were therefore designed to be considerably silica undersaturated (relative to fresh/ 
altered MORB)!*3-*°, While studies**~°° do indicate that SiO) can become soluble 
in H2O at high pressures, they infer that the solubility of silica in hydrous fluids 
only exceeds ~1 wt% at T> 900°C at 1 GPa (higher T at higher P). In contrast, 
slab dehydration occurs on all prograde slab paths at T < 850°C. Additionally, 
the composition of quenched hydrous fluids coexisting with MORB at 4GPa 
and 800°C (ref. 57) indicate that a maximum of ~12 wt% SiO, can dissolve in 
the fluid. Given that there should be considerably less than 10 wt% H2O (more 
likely << 5 wt% H,0) in subducting assemblages, this suggests a maximum SiO, 
loss in subducting MORB lithologies of ~0.6-1.2 wt%. The compositions used in 
previous studies have SiO; depletions ranging from 3 wt% up to, more commonly, 
6-10 wt% SiO, relative to MORB. 

We further investigated the effect of oceanic crust alteration and sub-arc dehy- 
dration on the composition of subducted MORB rocks by compiling a data set of 
altered MORB"! and exhumed blueschist, greenschist and eclogite facies rocks from 
exhumed terrains worldwide to compare them with fresh MORB”!, our starting 
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material and previous starting materials. We then assess the relevance of our 
starting material based on the composition of natural MORB rocks, rather than 
using models of the subduction process that contain few observable constraints. 
Results of this comparison are plotted in Extended Data Fig. 1. This analysis 
confirms that relative to fresh MORB, altered MORB and exhumed crustal rocks 
are somewhat depleted in SiO», up to a maximum of 6 wt% SiO, in the most 
extreme case, but more commonly 0-3 wt% SiO». Thus, many previous start- 
ing materials are too silica undersaturated to be good analogues of subducting 
MORB. Furthermore, this analysis reveals that altered and exhumed MORB are 
not enriched in CaO compared with fresh MORB, if anything they actually con- 
tain lower CaO on average. In contrast, all previous starting materials are enriched 
in CaO compared with fresh MORB. This is because most previous studies intro- 
duced the carbon component to their experiment by adding ~10 wt% calcite to an 
eclogite-base composition. We note that SLEC1 (ref. 43) was not created in this 
manner, but instead this composition falls far from the MORB field as the authors 
used an eclogite xenolith erupted by a Hawaiian volcano as a base material. By 
plotting the position of the maj-cpx join, defined by the composition of our 
experimental phases plotted in Extended Data Fig. 5, onto Extended Data Fig. 1a, 
we demonstrate that our bulk composition (ATCM1), ALL-MORB”, the vast 
majority of the fresh MORB field, altered“! and exhumed MORB samples fall on 
the CaO-poor side of this join, that is, on the Mg+Fe-rich side. Therefore, mag- 
nesite will be the stable carbonate phase in these compositions at high pressure 
(above dolomite breakdown). In contrast, all previous bulk compositions plot 
on the Ca-rich side of this join, or are very depleted in SiO», and therefore fall in 
a different phase field to the overwhelming majority of subducted MORB. This 
difference causes a considerable difference in the phase relations of our starting 
material relative to those used in previous studies. 

We acknowledge that no single bulk composition can be a perfect analogue 
for the entire range of subducting MORB compositions, however, ATCM1 is a 
good proxy for sections of the MORB crust between ~300 m and 7 km depth that 
have unaltered major element compositions and low CO) contents. Additionally, 
ATCM1 remains a better analogue for the uppermost portions of the MORB crust 
than starting materials employed in previous studies because its CO2 content 
is within the range of natural rocks while it is also not oversaturated in CaO or 
over depleted in SiO>. This is despite it falling towards the SiO2-rich end of the 
compositional spectrum of subducting MORB rocks. 

Slab fO2 and carbonate survival to transition zone conditions. Recent exper- 
iments have suggested that carbonate in eclogitic assemblages may be reduced 
to elemental carbon, either graphite or diamond, at depths shallower than 
250km (ref. 58). However, subducting slab geotherms are much colder than 
the experimental conditions investigated by this study, and additionally they 
are believed to contain considerable ferric iron that is further increased during 
de-serpentinization!. Indeed, several observations of carbonate inclusions in 
sub-lithospheric diamonds®”? require that slab carbon remains oxidized and 
mobile until diamond formation, far deeper than 250 km. Given the numer- 
ous observations from natural diamond samples, the general uncertainty in the 
mantle’s fO, structure and the lack of any conclusive experimental evidence that 
subducting carbon becomes reduced before reaching the transition zone we posit 
that nearly all subducting carbon is stable as carbonate throughout the upper 
mantle in subducting MORB assemblages. 

Carbonated MORB melting. Extended Data Table 2 presents the run conditions, 
durations and phase proportions in all carbonated MORB melting experiments, 
which are also summarized in Extended Data Fig. 2. Phase and melt compositions 
are presented in the Supplementary Tables 1-4. Phase proportions are calculated 
by mass balance calculations that use the mean composition of each phase as 
well as the reported 1o uncertainty in this mean as inputs. We note that the 1a 
uncertainty for some oxides in garnet and clinopyroxene minerals occasionally 
exceeds 1 wt%, although it is normally much smaller than this. These large uncer- 
tainties are a function of the small crystal sizes present in some runs, and not a 
function of sluggish reaction kinetics. Phase proportion calculations were run in 
a Monte Carlo loop of 10,000 calculation cycles where a varying random error 
was added to each oxide in each mineral phase during each iteration. Overall the 
distribution of varying random errors for each oxide form a Gaussian distribution 
with standard deviation equal to the reported 1o uncertainty of measurements. 
The reported proportions are the numerical mean of all calculation cycles and the 
r’ value reports the average squared sum of residuals. Low r values indicate that 
chemical equilibrium is likely to have been achieved and that mineral and melt 
compositions have been accurately determined. 

Representative BSE images of the polished experiments are shown in Extended 
Data Fig. 3. Garnets in experiments at all pressures contain abundant SiO) inclu- 
sions. In subsolidus experiments the number of inclusions increases and the defi- 
nition of mineral boundaries deteriorates, which makes accurate analysis of garnet 
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compositions increasingly challenging. In supersolidus runs, garnet minerals adja- 
cent, or near to, carbonatite melt pools have well defined edges and contain fewer 
inclusions. However, far from quenched melts the textures of garnets remain small 
and pervasively filled with inclusions, indicating the influence of melt fluxing on 
mineral growth. With increasing pressure, garnets become increasingly majoritic, 
with increasing quantities of octahedral silicon. 

Clinopyroxene was observed in all subsolidus experiments, as euhedral crys- 
tals that are often spatially associated with the carbon-bearing phase. Cpx abun- 
dance falls with increasing pressure and their compositions becoming increasingly 
dominated by sodic components (jadeite, aegerine and NaMgo sSir.5O¢) at high 
pressure (Extended Data Fig. 5). Cpx only disappears from the stable phase assem- 
blage in supersolidus experiments at 20.7 GPa. SiO, is observed in all runs and 
are small, often elongated tabular-shaped crystals. An oxide, either TiO at low 
pressure or an Fe-Ti oxide above 13 GPa (as described previously”*) are observed 
in all subsolidus runs. 

The carbon-bearing phase in subsolidus experiments changes with increasing 
pressure. At 3 GPa COz, marked by the presence of voids in the polished sample, 
is stable. This converts to dolomite at 7.9 GPa, consistent with the position of 
the reaction 2cs + dol = cpx + CO; (ref. 22). Beyond ~9 GPa dolomite becomes 
unstable and breaks down into magnesite + aragonite”*, Therefore, because the 
ATCM1 bulk composition lies on the Mg+Fe”*-rich side of the garnet-cpx join 
(Extended Data Figs 1a and 5), magnesite replaces dolomite as the carbon host 
in the experimental phase assemblage. This differs from experiments in previ- 
ous studies, where aragonite was dominant because bulk compositions fall on 
the opposite side of the garnet-cpx join. It is clear from the ternary diagrams 
(Extended Data Fig. 5) that while the tie-line between garnet and cpx remains, 
magnesite and aragonite cannot coexist ina MORB bulk composition. Finally, 
at pressures above 15 GPa, Na-carbonate becomes stable in the subsolidus phase 
assemblage. This is chemographically explained by the rotation of the garnet-cpx 
tie-line with increasing pressure (EDF5). Its appearance can also be justified as a 
necessary host of sodium at increasing pressure, since aside from clinopyroxene 
there is no other Na-rich phase stable on the Mg+Fe side of the maj-cpx join. 

The appearance of silicate melt, containing dissolved CO; (estimated by dif- 
ference), defines the solidus at 3 GPa. This may initially appear to contradict 
the results of some previous studies, which find carbonatite melts are produced 
near the solidus of carbonated eclogite at pressures lower than 7 GPa (refs 43, 45, 
46). However, this is easily explained by the differences in CO2 and SiO, con- 
tent used in these studies. The higher CO, and lower SiO contents of previous 
studies stabilize carbonate melt to lower temperatures relative to silicate melts. 
Indeed, we note that our results are consistent with those described previously’ 
(the two previous studies with the least depleted SiO2), which also observed that 
near-solidus melts below 5 GPa were basaltic to dacitic silicate melts containing 
dissolved CO. The results of one paper’? are not entirely self-consistent, in that 
at some pressures between 3.5 and 5.5 GPa the authors observed silicate melts 
before carbonate melts (4.5 and 5 GPa), whereas this relationship is sometimes 
reversed (5 GPa in AuPd capsules) or both melts were observed together (3.5 GPa). 
The observation of two immiscible melts in previous studies probably reflects the 
maximum CO, solubility in silicate melts. Since our bulk composition has less 
CO, akin to natural rocks, we do not observe liquid immiscibility. 

In all experiments above 7 GPa, near-solidus melt compositions are car- 
bonatititc and essentially silica-free. This result is notably different from those 
described previously”’, which reported that near-solidus melts were a mixture 
of silicate, carbonated silicate and carbonatite melts. We believe this contrast is 
caused by the interpretation of experimental run textures. Whereas ref. 19 iden- 
tified regions of fine-grained material consisting of mixtures of stable phases 
from elsewhere in the capsule as quenched melts, we have not followed the same 
interpretation of these features. Although we do recognize similar features in 
some run products, we have interpreted these features as a consequence of poor 
crystal growth in regions far from the influence of melt fluxing. In all supersoli- 
dus experiments, we observed regions of carbonatite material (typically <1 wt% 
SiO;) that is fully segregated from surrounding silicate minerals and possesses a 
typical carbonate-melt quench texture (Extended Data Fig. 3). Silicate minerals 
in close proximity to these melt pools are larger than those elsewhere in the same 
experiment, have well-defined crystal boundaries and contain few inclusions. 
Therefore, we attribute the variable texture and regions of fine-grained material 
present in experiments to the location of melt within experiments, which has a 
tendency to segregate to isolated regions of capsules under influence of tempera- 
ture gradients. Although melt segregation occurs in all supersolidus experiments, 
the efficiency of segregation and size of melt pools considerably increases with 
rising temperature above the solidus. Extended Data Figure 4 shows the highly 
systematic evolution of the melt compositions reported from our study with 
increasing pressure, strongly supporting our interpretations. 
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Carbonatite melts are calcic, Ca number > 0.5 (Ca number = Ca/ 
[Ca+Mg-+Fe]), despite subsolidus carbonates being dominated by magnesite 
(Extended Data Fig. 4). Melts have high concentrations of TiO, (typically 
1-3.5 wt%), P2Os (0.4-1.5 wt%) and K,O (0.3-1.5 wt%) and a variable Mg num- 
ber (0.33-0.7 defined as Mg/[Mg+Fe]). The alkali content of melts, strongly 
dominated by Na2O due to the bulk composition, increases with pressure (from 
1 to ~15 wt% Na,O at 7.9 and 20.7 GPa respectively; Extended Data Fig. 4). This 
increasing Na,O content is driven by the decreasing compatibility of Na2O in 
the residual mantle phase assemblages as the abundance of stable clinopyroxene 
falls. At 20.7 GPa the melt composition, as evidenced both by constant phase 
proportions and consistent melt/majorite compositions, remains constant over 
a temperature interval of ~350°C above the solidus. It is only when temperature 
reaches 1,530-1,600 °C (runs #16 and #31) that the silica content of the melt 
begins to increase (to 8.7 wt%) and CO; content falls as melts start to become 
silica-carbonatites. 

One experiment (#33) aimed to verify that measured low-degree melt composi- 
tions are accurate, and are not affected by analytical problems related to the small 
size of melt pools, was conducted at 20.7 GPa. In this experiment the abundance 
of carbonate melt was increased by adding a mix replicating the low degree melt 
composition ATCM2 to ATCM1 in a mass ratio of 1:3. If the composition of 
low-degree melts has been accurately determined in ‘normal’ experiments then 
this addition will have a negligible effect on phase relations or the compositions 
of the garnet, SiO» or melt; it would simply increase the melt abundance. The 
result of this experiment has a similar texture to all other experiments, where 
carbonatite melt segregates to one end of the capsule and is adjacent to large, well- 
formed majoritic garnets. The far end of the capsule has a much smaller crystal 
size, crystals have ragged edges, garnets are full of inclusions and SiO) is present 
along grain-boundaries and triple junctions (Extended Data Fig. 3h). Mineral and 
melt compositions, although not exactly identical, are similar to those measured 
in ‘normal’ experiments (to achieve identical compositions an iterative approach 
would be required that was not deemed to be necessary) thus confirming that 
near-solidus melt compositions have been accurately determined. The presence 
of fine-grained material away from segregated melt also acts to further confirm 
our hypothesis regarding the vital importance of melt presence for growing large 
crystals during experiments. 

Subsolidus carbonate species at high pressure. Comparing our starting material 
and results with those of previous studies using ternary and quaternary projec- 
tions (Extended Data Fig. 5) reveals that it is not possible for both magnesite 
and aragonite to coexist alongside majorite and clinopyroxene owing to stable 
mineral phase fields (see earlier). Thus, in Mg-Fe-dominated compositions, 
such as our starting material, magnesite is the stable carbonate at high-pressure 
subsolidus conditions. Whereas in Ca-dominated compositions aragonite will 
be the stable carbonate beyond the pressure of dolomite dissociation. Natural 
subducting MORB compositions, which contain, at most, a similar quantity of 
CO, to our bulk composition!!, almost all lie on the Ca-poor side of the majorite- 
clinopyroxene join (Extended Data Figs 1 and 5). In this situation, as our exper- 
iments demonstrate, cpx remains an important Na-host in MORB assemblages 
to high pressures alongside [Na,K]o 33Cao.67CO3 structured carbonate. Ca-rich 
compositions containing subsolidus CaCO; experience different phase relations 
because aragonite can dissolve considerable Na2O and so is the sole Na-host in 
these compositions. We conclude that because the majority of natural MORB 
rocks fall on the Mg+Fe side of the maj-cpx join, like our bulk composition, 
that the phase relations determined in this study are applicable to the case of 
natural subduction. Therefore, the melting point depression we observe along 
the carbonated MORB solidus at uppermost transition zone pressures is generally 
applicable to subducted oceanic crust. 

Melt-mantle reactions. Without the influence of slab-derived melts, the anhy- 
drous transition zone peridotite assemblage at 20.7 GPa and 1,600°C (experiment 
G168 and G176) is dominated by Na-poor majorite and wadsleyite (Mg num- 
ber = 0.90) (Extended Data Fig. 6, Extended Data Table 3 and Supplementary 
Table 5a). Upon reaction with the near-solidus alkaline carbonatite defined during 
melting experiments, ATCM2, a clearly defined reaction zone is observed between 
this ambient peridotite assemblage and the infiltrating melt (Extended Data Fig. 6). 
The products of this reaction are garnet containing a notable Na2X”* Sis0,2 major- 
ite component, Ca(Si,Ti)O3 perovskite, ringwoodite, ferropericlase and diamond. 
All of these phases were identified using Raman spectroscopy (Extended Data 
Fig. 7) and their compositions are presented in Supplementary Table 5a. Raman 
spectroscopy alone, which was performed before any sample polishing using 
diamond-based products, confirms the creation of diamond during these reac- 
tions. We have not observed diamond using SEM techniques and believe that it 
resides as sub-micrometre-sized inclusions in the various reaction-product min- 
erals where it is seen by spectroscopic methods. The experiments performed on 


intimately mixed powders of melt and pyrolite also form the same phase assem- 
blages (Extended Data Table 3) and mineral compositions from those runs are 
also presented in Supplementary Table 5b, c. 

We observed the reaction products as new crystals floating in the residual 
carbonatite melt and/or nucleated on the relics of the peridotite assemblage, 
thus creating zoned minerals. We have demonstrated that the composition of 
majorite minerals crystallizing during the reactions lie between those expected 
for peridotitic and eclogitic minerals at a similar pressure and possibly explain 
intermediate-composition diamond-hosted majorites (Fig. 2). We suggest that 
the full range of intermediate inclusion compositions might be created by the 
gradual shift in phase compositions, from those we observe towards more perid- 
otitic minerals as the melt composition reacts with increasing quantities of mantle 
material. Additionally we have shown that the compositions of calcium perovskite 
(Extended Data Fig. 8) and ferropericlase (Fig. 3) formed during the reactions are 
consistent with diamond-hosted minerals of those species. Further experiments, 
across the solidus ledge and into the uppermost lower mantle pressure range are 
required to test whether melt-mantle interactions account for all diamond-hosted 
inclusions. 
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Extended Data Figure 1 | Comparison of experimental compositions 
with natural rocks. a—f, ‘Fresh’ MORB rocks (red field), ALL-MORB7! 
(red circle), altered MORB rocks*! (pale blue circles), exhumed blueschist, 
greenschist and/or eclogitic rocks (yellow circles) and starting material 
from this (dark blue circle) and previous studies (green circles) of 
carbonated MORB compositions. In a, rocks altered MORB and exhumed 
rock compositions that fall on the Mg-Fe side of the maj-cpx join from 


40 42 44 46 48 50 52 54 
SiO, 

Extended Data Fig. 5 plot below the dashed line, compositions that lie on 
the Ca side of this join are plotted as orange circles with yellow outlines 
or purple circles with blue outlines and sit above the dashed curve. This 
confirms that magnesite will be the stable carbonate phase at high pressure 
in the vast majority of natural crustal rocks, as is the case for ATCM1. Data 
and corresponding references for this figure are provided in the online 
source data file. 
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Extended Data Figure 2 | Experimental results/phase diagram and 
interpreted solidus position. The reactions clinopyroxene + 

CO2= dolomite + 2coesite and dolomite = magnesite + aragonite are 
from refs 22 and 23 respectively. The upper left curve is the anhydrous 
MORB solidus. Note that due to temperature gradients in experiments 


Pressure (GPa) 


at 8 GPa, a small quantity of dolomite is observed coexisting with melt in 
one experiment above the solidus, present at the cold end of the capsule. 
arag, aragonite; CM, carbonatite melt; cpx, clinopyroxene; cs, coesite; dol, 
dolomite; gt, garnet; mag, magnesite; maj, majoritic garnet; Na carb, Na 
carbonate; ox, FeTi oxide; SM, silicate melt; st, stishovite. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 3 | BSE images of experimental products. h, sandwich experiment, 20.7 GPa, 1,400 °C. Scale bars, 10 1m. CM, 
a, 7.9 GPa, 1,250°C; b, 7.9 GPa, 1,350 °C; ¢, 13.1 GPa, 1,350°C; d, 13.1GPa;__—_carbonatite melt; cpx, clinopyroxene; dol, dolomite; FeTi, FeTi oxide; 
1,450 °C; e, 20.7 GPa, 1,100 °C; f, 20.7 GPa, 1,480 °C; g, 20.7 GPa, 1,600 °C; gt, garnet; mag, magnesite. 
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Extended Data Figure 4 | Composition of experimental melts from this study. a, b, Experimental melts from selected previous studies marked with 
semi-transparent greyscale symbols. b, The effects of increasing pressure, temperature and the effect of contamination due to partial analysis of silicate 
minerals surrounding small melt pools are shown. 
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Extended Data Figure 5 | The composition of experimental phases from _ projected onto the basal ternary. The yellow star plotted in the four- 
this study projected into two quaternary plots. a, b, [Ca]-[Mg+Fe**]- component system and projected onto the basal ternary is ATCM1 (our 
[Si+Ti]-[Na+K] (a) and [Mg+Fe?*]-[Ca]-[Al+Fe**]-[Na+K] (b). In bulk composition) while the black stars are bulk compositions from 
both diagrams the grey fields are the compositional data projected onto previous studies”>~*”. 


the basal ternary. The red field is the range of natural MORB compositions 


© 2016 Macmillan Publishers Limited. All rights reserved 


a reaction 


peridotite 


reaction 


wad + maj 


CaPy 
Extended Data Figure 6 | BSE images of reaction experiments. 
a-d, G169 (a, b) and G177 (c, d). In both experiments a reaction zone 
and remaining carbonatite melt surrounds the unreacted peridotite 
region. a, An overview of G169. b, A close up of the reaction in G169 


containing newly crystallized calcium perovskite, majorite, ferropericlase 


tw maj (new rims on old cores) 


_CaPv + maj 


wad + maj 


reaction 


CaPv 


and ringwoodite minerals. c, A close up of the reaction products in G177, 
which consist of small bright calcium perovskites, new majorite that is 
often observed as a rim on relic peridotitic garnet and ringwoodite. 

d, An overview of G177. CaPy, calcium perovskite; fper, ferropericlase; 
maj, majorite; rw, ringwoodite; wad, wadsleyite. 
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Extended Data Figure 7 | Raman spectra of minerals from reaction experiment G177 measured using a blue 455 cm! excitation laser. The position 
1 


of the main peaks in each collected spectrum have been labelled with their shift from the excitation laser in cm~*. 
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Extended Data Figure 8 | Comparison of diamond-hosted calcium 
perovskite inclusions with experimental mineral compositions in 
MgO versus Ti number space. Ti number = Ti/[Ca+Ti]. Data and 
corresponding references for this figure are provided in the online source 
data file. 
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Extended Data Table 1 | Starting materials used in this and previous studies 


ALL pil — i 43 42 19,44 19 45 45 45 46 —— 31 32 
MORB”' 1256 (this SLEC1™ OTBC™ GAticc”™” VOLGAcc” SLEC2” SLEC3™” SLEC4™” G2C (this KR4003° PM1 
MORB" _ study) study) 

SiO, 50.47 51.48 50.35 41.21 47.23 45.32 42.22 30.29 41.69 4401 44.38 0.71 44.90 45.74 
TiO, 1.68 1.44 1.33 2.16 - 1.34 1.43 1.58 2.18 2.31 175 1:52 0.16 0.21 
Al,O; 14.70 14.18 13.66 10.89 15.35 14.88 15.91 7.95 11.02 1163 13.98 0.33 4.26 5.01 
Cr,0; 0.41 0.27 
FeO 10.43 11.90 11.35 12.83 8.93 8.85 9.46 13.87 11.03 11.65 10.11 8.43 8.02 8.07 
MnO 0.18 0.22 0.21 0.12 - 0.15 0.14 0.23 0.12 0.13 0.25 - 0.13 0.14 
MgO 7.58 7.30 7.15 12.87 6.24 7.15 7.64 14.28 11.07 11.68 8.54 6.50 37.30 34.57 
CaO 11.39 10.78 10.80 13.09 14.77 14.24 14.85 14.88 16.89 14.70 12.69 23.21 3.45 3.86 
Na,O 2.79 2.53 2.48 1.63 2.91 3.14 3.36 1.75 1.40 1.48 3.29 16.35 0.22 0.59 
K,0 0.16 0.06 0.06 0.11 0.02 0.40 0.42 0.13 0.10 0.10 0.03 0.57 0.09 0.08 
P.O; 0.18 0.11 0.10 0.14 0.15 0.37 
NiO 0.24 0.20 
co, 2:52 5.00 4.43 4.40 4.40 14.99 4.42 2.21 5.00 42.01 
Total 99.57 100.00 100.00 99.91 99.88 100.01 99.98 99.95 99.92 99.90 100.02 100.00 98.18 98.74 
Ca# 0.38 0.36 0.36 0.32 0.48 0.46 0.45 0.33 0.41 0.37 0.39 0.60 0.06 0.07 
Mg# 0.57 0.52 0.53 0.64 0.56 0.59 0.59 0.65 0.64 0.64 0.60 0.58 0.89 0.88 


Ca number = Ca/[Ca+Mg+Fe]. Mg number = Mg/[Mg+Fe]. 
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Extended Data Table 2 | Summary of run conditions and products for carbonated MORB melting experiments 


P (GPa) T(°C) time (min) capsule run products comments gt cpx cs/st ox dol mag Nacarb SM CM CO, Fr 
162G 3 1100 7200 AUsoPdao gt, cpx, cs, rut, CO, 29.96 59.02 7.84 0.24 2.52 0.12 
161G 3 1150 = 7200 AUspP dao gt, cpx, cs, rut, CO, 27.69 61.07 7.87 0.20 2.52 0.16 
#5 3 1250 2880 AugyPdag 9b i i SM, 28.01 53.93 5.76 0.08 9.71 2.03 0.55 

2 
#32 5.1 1300 1560  AusoPdzo gt, cpx, cs, rut, SM 30.22 40.81 7.92 20.97 0.20 
#9 79 1250 1470 Au mai, cpx, cs, rut, dol S°/omite segregated atone 59 39 39.05 12.07 0.83 5.48 0.18 
#10 79 1300 1440 Au Mal, oo mertiin eee “C above 45 43 34.87 12.53 0.51 6.57 0.12 
hai opsice: uk melt adjacent to TC, dolomite 
#14 7.9 1350 1770 AuysPdas TAs HP Th at cold end (~ 50 °C below 44.49 34.96 13.62 0.63 1.95 4.42 0.11 
, TC) 

#34 7.9 1400 1440 PtRe maj, cpx, cs, CM Fe loss to capsule 
#29 13.1 1200 1440 Au Mal, fut. magnesite ai at One 64 37 14.85 15.31 0.49 4.75 0.39 
#40 13.1. 1280 120 Au maj, pine rut, scab seiege after2 63.45 16.14 14.86 0.40 4.82 1.02 
#12 13.1. 1350 1440 Au maj, pie OE anggnesto a at one §3 66 16.20 15.66 0.11 4.13 3.70 
#39 13.1. 1350 1560 Au maj, pie rut, repeat of #12 62.88 15.36 15.88 0.52 4.33 2.89 
#17 13.1 1400 1440 Au maj, cpx, st, CM very small melt pools 61.89 18.82 15.27 6.45 0.34 
#30 13.1. 1450 1500 Au __ maj, cpx, st, CM 63.69 15.55 14.60 5.86 0.10 
#37 15.3 1115 2940 Au i ot es 65.50 14.56 14.00 0.87 4.52 0.77 
#38 15.3 1190 3150 Au So 64.39 12.04 17.38 0.39 5.44 0.23 
#35 15:3) 1250 1920 Au maj, cpx, st, CM small melt pools 64.14 13.50 15.74 6.31 0.13 
#36 20.7 1100 3240 Au — Paes 64.76 12.43 15.27 2.00 3.17 2.38 0.05 
#27 20.7 1200 1440 Au maj, st, CM Re furnace, small melt pools 76.30 17.56 5.50 0.37 
#28 20.7 1300 1440 Au maj, st, CM Re furnace 75.32 17.71 5.88 0.25 
#19 20.7 1400 3660 Au maj, st, CM 75.14 18.42 5.81 0.46 
#11 20.7 1450 1440 Au maj, st, CM 76.15 17.66 5.59 0.46 
#13 20.7 1480 1650 Au maj, st, CM 76.75 17.03 5.89 0.14 
#16 20.7 1530 1440 Au maj, st, CM small melt pools 76.48 17.41 5.99 1.03 
#31 20.7 1600 600 Au maj, st, CM 75.10 17.28 7.37 0.34 
#33 20.7 1400 1440 Au maj, st,CM 75% ATCM1 + 25% ATCM2 61.48 10.41 27.58 2.16 


Mass balance calculations were performed as described in Supplementary Information. Phase proportions are in wt%. CM, carbonatite melt; cpx, clinopyroxene; cs, coesite; dol, dolomite; FeTi oxide, 
iron-titanium-rich oxide phase; gt, garnet; mag, magnesite; maj, majoritic garnet; Na carb, sodic carbonate; rut, rutile; SM, silicate melt; st, stishovite. 
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Extended Data Table 3 | Summary of reaction experiments run conditions and experimental products 


Type Starting Material Expt No. T (°C) pay Mineral phases 
Synthesis KR4003 + 2.5 wt.% Fe G168 1600* 510 wad, maj, capv, Fe 
~ G176 1590 480 wad, maj, CM* 
Reaction G168 + ATCM2 G169 +1600 30 maj, capv, rw, fper, CM, maj‘, wad’, capv', Dit 
G176 + ATCM2 G177__—- 1590 124 maj, capv, rw, mag, CM, Al,Os, maj', wad", Di* 
Mixture PM1:ATCM2 (wt.%) in Fe capsules) 
9:1 Y17b 1400 180 fper, maj, rw, wad, capv*, melt, Fe, Di 
t3 Y16a 1400 270 fper, maj, rw, wad, capv*, melt, Fe, Di 
11 Y16b 1400 270 fper, maj, rw, wad, capv*, melt, Fe, Di 
PM1:ATCM2:Fe (mol.%) in an Au capsule 
16:4:1 G183b 1400 maj, rw, capv, mag, CM 


capy, calcium perovskite; CM, carbonatite melt; Di, diamond; Fe, Fe metal; fper, ferropericlase; mag, magnesite; maj, majorite garnet; rw, ringwoodite; wad, wadsleyite. 
*Trace/minor phase. 

tRelics of the peridotite starting material. 

+Thermocouple broke during run; temperature estimated using power curves with maximum uncertainty of 150°C. 
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Holocene shifts in the assembly of plant and animal 
communities implicate human impacts 


S. Kathleen Lyons!, Kathryn L. Amatangelo?, Anna K. Behrensmeyer', Antoine Bercovici!, Jessica L. Blois?, Matt Davis, 
William A. DiMichele!, Andrew Du”, Jussi T. Eronen®, J. Tyler Faith’, Gary R. Graves*’, Nathan Jud!®"", Conrad Labandeira!!?, 
Cindy V. Looy™, Brian McGill", Joshua H. Miller!°, David Patterson®, Silvia Pineda-Munoz”, Richard Potts!®, Brett Riddle”, 


Rebecca Terry’, Aniké Toth!, Werner Ulrich”!, Amelia Villasefio: 
Donald Waller?’ & Nicholas J. Gotelli2* 


Understanding how ecological communities are organized and 
how they change through time is critical to predicting the effects 
of climate change'. Recent work documenting the co-occurrence 
structure of modern communities found that most significant 
species pairs co-occur less frequently than would be expected 
by chance”. However, little is known about how co-occurrence 
structure changes through time. Here we evaluate changes in 
plant and animal community organization over geological time 
by quantifying the co-occurrence structure of 359,896 unique 
taxon pairs in 80 assemblages spanning the past 300 million years. 
Co-occurrences of most taxon pairs were statistically random, 
but a significant fraction were spatially aggregated or segregated. 
Aggregated pairs dominated from the Carboniferous period 
(307 million years ago) to the early Holocene epoch (11,700 years 
before present), when there was a pronounced shift to more 
segregated pairs, a trend that continues in modern assemblages. 
The shift began during the Holocene and coincided with increasing 
human population size*” and the spread of agriculture in North 
America®”. Before the shift, an average of 64% of significant pairs 
were aggregated; after the shift, the average dropped to 37%. 
The organization of modern and late Holocene plant and animal 
assemblages differs fundamentally from that of assemblages over 
the past 300 million years that predate the large-scale impacts of 
humans. Our results suggest that the rules governing the assembly 
of communities have recently been changed by human activity. 
How are plant and animal communities organized, and does their 
structure change through time? This question has dominated many 
decades of research on community assembly rules and is critical 
to charting the future of biodiversity’. Whereas most studies have 
described overall community structure with simple indices such as 
species richness® and average co-occurrence’, some analyses catego- 
rize individual species pairs in assemblages as random, aggregated, or 
segregated”?. Segregated species pairs may be generated by processes 
such as negative species interactions, distinct habitat preferences, and 
dispersal limitation. Aggregated species pairs may be generated by 
processes such as positive species interactions, shared habitat prefer- 
ences, and concordant dispersal’. Recent meta-analyses document an 


r°, Scott Wing!, Heidi Anderson”, John Anderson”, 


excess of segregated species pairs in modern communities: most sig- 
nificant species pairs co-occur less frequently than would be expected 
by chance’. The relative dominance of segregated versus aggregated 
species pairs suggests an important role for biotic interactions such as 
competition and predation, habitat selectivity, and dispersal limitation 
in structuring modern communities. 

Do the patterns of species segregation that characterize modern 
assemblages also hold in the fossil record, or is the present different? 
If there was a change, when did the modern condition arise? There are 
many examples from the fossil record of times of major reorganization 
in ecological communities, such as a shift in the complexity of marine 
invertebrate communities after the end-Permian mass extinction!. 
But even during the lengthy periods between mass extinctions, the 
nature of species interactions may change. For example, the diver- 
sity and intensity of insect herbivory increased during a warming 
trend from the Late Palaeocene to the Eocene!”. Moreover, many late 
Pleistocene plant and animal assemblages that contain some extant 
species have no modern analogues!*"“. Such results hint that general 
patterns of species associations observed in contemporary assemblages 
could have been quite different in the past. 

Here we ask whether non-random species associations of plant and 
mammal assemblages over the past 300 million years (Myr) are domi- 
nated by segregated or aggregated species pairs. This novel analysis is 
designed to compare statistical patterns of taxon associations for fossil 
and modern data using a consistent set of methodologies. We analysed 
80 well-sampled fossil and recent assemblages: 38 for mammals and 
42 for plants (see Supplementary Information, Extended Data Fig. 1 
and Extended Data Table 1). Each data set contained information on 
taxon presence and absence across multiple localities in a particular 
time period (Extended Data Fig. 1 and Extended Data Table 1). Ages 
of plant data sets range from 307 million years ago (Ma) to the present 
and are from North America and Africa. Mammal data sets range in 
age from 21.4 Ma to the present and are from North America, Eurasia, 
and Africa. We compared each data set to a ‘null’ assemblage gen- 
erated by randomization, scored each taxon pair as random, aggre- 
gated, or segregated, and used an empirical Bayes approach to control 
for the rate of false positive discoveries!*; see Methods). Finally we 
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Figure 1 | Proportion of aggregated pairs over the past 300 Myr. 
Weighted Loess curve with shaded 95% confidence intervals illustrates 
reduction in the proportion of aggregated species pairs in the Holocene 
(log scale). Dotted vertical line at 5,998 years delineates the linear model 
breakpoint in the trend (Methods and Extended Data Fig. 2). Non-random 
species pairs of ‘Fossil’ data (blue density profile) are predominantly 
aggregated, whereas ‘Moderr data (red density profile) are predominantly 
segregated. Colours indicate continent: North America (green), Eurasia 
(purple), Australia (dark grey), South America (dark blue), Africa 


synthesized our results with those from a meta-analysis of 39 modern 
communities that used the same methodology”””. 

For all fossil data sets, most taxon pairs were random (87-100% 
of possible pairs; Extended Data Table 1), which is also typical for 
modern assemblages’. This result reflects the statistically conservative 
nature of the tests used to identify significantly associated pairs, and 
the fact that most taxon pairs in a diverse, well-sampled assemblage 
interact weakly, or not at all. In 62 of 80 assemblages analysed here, a 
subset of taxon pairs showed significant associations that are stronger 
than can be explained by the null model, even after controlling for the 
false discovery rate (Fig. 1). Unlike modern mainland assemblages, 
most significant associations in the fossil record are aggregated, pos- 
itive associations (Fig. 1). This pattern is consistent across the past 
300 Myr for the diverse fossil assemblages in this study, which encom- 
pass mammals, plant macrofossils, and pollen from multiple conti- 
nents and time slices. 

However, beginning in the Holocene, there was a significant tem- 
poral trend towards a greater proportion of segregated species pairs, 
which is consistent with the results for modern assemblages. A break- 
point analysis indicates that the shift began approximately 6,000 years 
ago (Extended Data Fig. 2). Confidence intervals of the breakpoint 
are large owing to a lack of appropriate data sets between 20,000 and 
1 million years ago. Therefore, it is difficult to pinpoint the exact time 
of the shift, but a closer examination of the data suggests that placing 
it within the Holocene is reasonable. Before the breakpoint, on average 
64% of significant pairs were aggregated (median = 73%). After the 
breakpoint, the average dropped to 37% (median = 42%). This trend 
is not driven by the modern data and persists when only fossil data are 
analysed (Extended Data Fig. 3). 

Why are species associations so different in fossil versus modern 
assemblages? We first tested and eliminated five potential ‘artefact’ 
hypotheses that are related to sampling issues (see Methods for 
details). (1) Collection modes were discounted because they were 
heterogeneous both for the modern and for fossil assemblages, and 
because the decrease in aggregated pairs was strong in fossil pollen 


(orange). Point shapes indicate type of data: pollen (square), mammals 
(triangle), macroplants (circle). Data on terrestrial communities from 
ref. 2 are diamonds. All fossil and modern data are from mainland sites; 
no island sites were included. Time values of modern data points were 
assigned a single age (see Supplementary Information data sets), but are 
jittered for visual representation. P-T, Permo-Triassic transition; K-Pg, 
Cretaceous—Palaeogene transition; PETM, Palaeocene—Eocene thermal 
maximum. 


and mammal assemblages that spanned the shift. Moreover, sampling 
methodology was consistent within an assemblage type across periods 
that encompass the change (Extended Data Fig. 4). (2) Scale was 
discounted because there was no relationship between the spatial or 
temporal extent and grain of each data set and the percentage of aggre- 
gated pairs (Fig. 2 and Extended Data Fig. 5). (3) Taphonomic bias was 
discounted because the null model algorithm preserved the marginal 
totals of the data matrix in each randomized assemblage, controlling 
for simple taphonomic biases that could generate heterogeneity in the 
number of species per site or the number of occupied sites per species. 
(4) Taxonomic resolution was discounted because parallel analyses at 
the genus and species levels did not produce systematic changes in the 
proportions of aggregated pairs (Extended Data Table 2). (5) Increased 
sampling of rare species in modern data sets was discounted because 
segregated pairs tend to form in species with intermediate occupancy, 
whereas aggregated pairs form both in common and in rare species in 
modern and fossil data sets. All of these mechanisms can potentially 
affect assemblage structure in fossil and modern data sets. However, 
our analyses suggest that these mechanisms cannot account for the 
prominent decrease in aggregated species pairs that began during the 
Holocene (Fig. 1). 

The failure of sampling issues to account for the temporal change in 
the percentage of non-randomly associated taxon pairs suggests that a 
mechanistic explanation is required. We consider two hypotheses that 
invoke a systematic change in either abiotic or biotic factors as drivers 
of co-occurrence patterns. 

One of the most obvious differences between the present interval 
and the past 300 Myr of geological history represented by these fossil 
assemblages is the increasing variability of climate towards the pres- 
ent, associated with the glacial-interglacial cycles of the Quaternary 
period’®. This is not to say that there were no periods of high climate 
variability before the ice ages, but that our data do not regularly sample 
times of high climate variability in deep time. If climate variability is 
responsible for the shift in the frequency of aggregated species pairs, 
there should be a negative relationship between climate variability 
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Figure 2 | Relationship between scale and proportion of aggregated 
pairs. The proportion of significant pairs that are aggregated does not 
depend on the temporal or spatial scale of data. Each point represents a 
single data set. a, b, Aggregated pairs versus spatial extent (longest linear 
distance between any two sites in a data set; a) or spatial grain (estimated 
radius of collection area that fossil specimens would have been transported 


and the percentage of aggregations. We quantified climate variability 
within the temporal extent of each data set for the past 65 Myr, using 
climate data from ice!” and deep sea’ cores that were standardized 
to a common scale (Methods). We found no relationship between 
the proportion of aggregated pairs and the standard deviation of 
climate within the sampled time slice (Extended Data Figs 6 and 7), 
or the standard deviation of the first differences of climate within the 
sampled time slice (Fig. 3a). Collectively, these results suggest that 
the increasing variability in climate in the Quaternary cannot explain 
the decreased frequency of aggregation. 

An alternative explanation is that the mid- to late Holocene is 
unusual in the evolutionary history of terrestrial ecosystems, and that 
biotic drivers (as opposed to climate) now are different from what they 
have typically been over the past 300 Myr. First, we asked whether there 
was a significant shift in the proportions of aggregated versus segre- 
gated pairs across critical geological intervals that spanned periods of 
mass extinctions or major climatic change during the past 300 Myr 
(Extended Data Table 3 and Extended Data Fig. 4). We found a signifi- 
cant decrease in the percentage of positive associations only in data sets 
that spanned the Pleistocene—-Holocene transition (11,700 years ago). 
With the exception of large-bodied mammals in North America and 
Africa (Extended Data Fig. 4b), aggregated species pairs decreased in 
all data sets through the Pleistocene—Holocene transition. In contrast, 
there was no significant change in the percentage of aggregations across 
the three other critical intervals that were encompassed by these data: 
the Permo-Triassic transition (252 Ma), the Cretaceous—Palaeogene 
transition (66 Ma), and the Palaeocene-Eocene thermal maximum 
(56 Ma). These intervals include the Cretaceous—Palaeogene mass 
extinction, responsible for the loss of the non-avian dinosaurs'®, and the 
Permo-Triassic extinction, the largest mass extinction ever recorded!®, 
Even the Palaeocene-Eocene thermal maximum, a period of major cli- 
matic change in which global temperatures increased ~5-8°C in a few 
millennia’, did not coincide with a change in the relative proportions 
of aggregated versus segregated pairs. 
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Temporal grain (yr) 
to the depositional environment in a typical locality; b). c,d, Proportion 
of aggregated pairs versus temporal extent (duration from the oldest to 
youngest locality in a data set; c) or temporal grain (typical amount of 
time-averaging of localities in a data set; d). Colours and shapes as in Fig. 1. 
Note the logarithmic scale of the x axes. Modern data from ref. 2 are 
excluded from this analysis. 


It is difficult to pinpoint the exact mechanism responsible for the 
uniqueness of the present time interval. However, our analyses provide 
some clues about possible cause. Data that encompass the shift towards 
the modern pattern are almost exclusively North American (Fig. 1 
and Extended Data Fig. 4). The statistical confidence interval brack- 
eting the breakpoint at 6,000 years ago encompasses the beginning 
of agriculture in North America around 8,000 years ago® and the 
increase in human populations during the Holocene*®. The trend 
towards greater segregations in North American pollen (Fig. 1 and 
Extended Data Fig. 4), with particularly strong shifts occurring in 
the past 2,000 years”, is also consistent with the history of agricul- 
ture in North America. Cultivation of multiple species of domesti- 
cated plants began approximately 3,800 years ago®”’, with evidence for 
more general dependency on agriculture in North America beginning 
1,300 years ago’. Estimates of global land area under cultivation 
increase rapidly starting 6,000 years ago and are as high as 4 x 108 
hectares (1 hectare = 10* m’) by 2,000 years ago*”. 

Possible drivers by which increasing human impacts led to an 
increase in segregated pairs include (1) increases in hunting and 
domestication of particular species?! ?, (2) changes in land use*, 
(3) increases in the frequency of fire’, (4) increases in habitat fragmen- 
tation and dispersal barriers**4, and (5) deliberate and accidental 
spread of species beyond their native geographical ranges”*?”. We 
note that modern island assemblages (which we excluded from our 
comparisons with fossil assemblages) are more segregated than mod- 
ern mainland assemblages (Fig. 3b), which is consistent with the 
hypothesis that habitat fragmentation and dispersal limitation have 
increased segregated pairs. Possibly all of the processes listed play 
a role. Although their combined effects on taxon pairs are difficult 
to predict, the relative importance of factors structuring species co- 
occurrence appears to have changed through the Holocene. Future 
work comparing the co-occurrence structure of fossil and modern 
communities should allow us to better understand how this altera- 
tion will play out in the future. Regardless of the precise mechanisms, 
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Figure 3 | Tests of possible mechanisms for decreasing spatial aggregation 
through time. a, Climate variability within the temporal extents of the 
fossil data sets is uncorrelated with the proportion of aggregated species 
pairs. Climate variability is measured as the standard deviation of the first 
differences in climate across the interval (see also Extended Data Fig. 6). 

b, Box plots show the proportion of aggregated species pairs for fossil data, 
modern islands and modern mainland assemblages. Dashed lines indicate 
maximum and minimum values, circles are outliers. Island assemblages, 
with more limited capacity for dispersal, have the smallest and least variable 
fraction of aggregated pairs. Mainland fossil assemblages are significantly 
more aggregated than mainland modern assemblages. Note, these island 
data were excluded from other analyses. 


humans appear to be agents of disturbance on a large scale and have 
been so for longer than is often recognized. 

Our results suggest that assemblage co-occurrence patterns 
remained relatively consistent for 300 Myr but have changed over 
the Holocene as the impact of humans has dramatically increased. 
Across shallower time intervals, other studies have documented that 
local and regional species composition has changed substantially over 
recent decades”*”® and millennia*”. The rules governing community 
assembly, at least as implicated by co-occurrence patterns, seem to 
have changed during the Holocene and continue to change with the 
increasing influence of human activity. The co-occurrence structure 
of modern and recent plant and animal assemblages thus appears to 
be unique in the evolutionary history of terrestrial ecosystems, an 
important perspective for assessing challenges to these ecosystems in 
the face of present and future human impacts. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Detection of non-random species pairs. The data for each analysis consist of a 
binary presence-absence matrix in which each row is a taxon and each column is 
a sample. The entries represent the presence (1) or absence (0) of a particular taxon 
in a particular sample. Within this matrix, each of the S(S— 1)/2 unique species 
pairs is tested and classified as random, aggregated, or segregated. The tests were 
performed with the PAIRS version 1.0 software application'>*!. The methodology 
is described fully in ref. 2 and is briefly described here. 

The analysis begins by calculating a scaled C score’: Cy = (Ri- D)(Rj- D)/RiR;, 
where Cj is the C score for species pair i and j, R; is the row total (the number of 
species occurrences) for species i, Rj is the row total for species j, and D is the 
number of shared sites in which both species are present. For each species pair, 
this index ranges from 0.0 (aggregation: maximal co-occurrence of both species) 
to 1.0 (segregation: minimal co-occurrence of both species). PAIRS calculates the 
C score for each pair of species and assigns it to a histogram bin. There are 20 bins 
that range from 0 to 1 in 0.05 intervals, plus a bin for 0.0 (perfectly aggregated pairs) 
and a bin for 1.0 (perfectly segregated pairs). 

We next estimate the P value associated with each species pair by a randomi- 
zation test. The data matrix is first randomized by reshuffling all matrix elements, 
with the restriction that the row and column sums of the original matrix are pre- 
served. This ‘fixed-fixed’ algorithm has been subject to extensive benchmark test- 
ing with artificial random and structured matrices”*>", Compared with a variety 
of other null model algorithms, the fixed-fixed algorithm most effectively screens 
against type I errors (incorrectly rejecting the null hypothesis for a random matrix), 
but is somewhat conservative*’. 

An alternative algorithm ‘fixed-equiprobable’ retains row sums (species occur- 
rence frequencies), but allows column totals (species richness per site) to vary 
freely. The fixed-equiprobable algorithm also has good statistical properties, and is 
appropriate for modern data sets in which sampling effort has been standardized, 
such as quadrat samples of fixed area. However, this algorithm is not appropriate 
for fossil data because the number of species detected per site in fossil assemblages 
is determined primarily by sampling effort of the collector and by site-specific 
taphonomic biases in preservation. 

For these reasons, we have used only the fixed-fixed model, both for the analysis 
of fossil assemblages and for comparison with modern assemblages. Details of the 
randomization are discussed further in refs 2, 35. Using 1,000 randomizations, 
we create a simple P value (two-tailed test) for each species pair, which leads to a 
classification of each species pair as aggregated, random, or segregated. 

However, with a total of S(S — 1)/2 unique pairs in a matrix of S species, retaining 
all of the significant pairs (P < 0.05) would generate a potentially large number 
of false positive results. This problem has frequently arisen in the analysis of 
micro-arrays, genomic surveys, and other examples of ‘big data’*°. The PAIRS 
analysis relies on an empirical Bayes approach by creating a histogram of C score 
values based on the pairs generated in each null assemblage. To screen out false 
positives, we calculated the average number of species pairs in each bin of the 
histogram. Next, we determined the observed number of pairs from the empir- 
ical assemblage in each bin, ordered by their P values from the simulation. We 
retained only those pairs that were above the mean number for each bin and were 
statistically significant on the basis of the simple P value criterion. This double 
screen effectively eliminates many of the false-positives that can arise in random 
data sets”. 

Weighted Loess regression. A Loess smoothing line was created with the 
stat_smooth function in the R package ggplot2 version 1.0.0 (ref. 37) using default 
parameters. For Loess fitting, the fit at point x is made using points in the neigh- 
bourhood of x (closest 75% of total points), with tricubic weighting (proportional 
to (1 — (distance/maximum distance)*)*). Points were additionally weighted by 
the number of sites in each matrix and 95% confidence intervals were generated 
using a t-based approximation. 

Analysis of climate variability. To examine the how climate variability impacts 
the percentage of aggregated species pairs, we used climate proxy data from ice!” 
and deep sea cores!®, which collectively encompass the past 65 Myr of the assem- 
bled data sets. The European Project for Ice Coring in Antarctica (EPICA) data 
were used preferentially when there was temporal overlap between proxy data sets. 
Climate data were mean centred and standardized before pooling into a single 
data time series. We then sampled the climate data across the ‘temporal extents’ 
(Extended Data Table 1) of the individual Evolution of Terrestrial Ecosystems 
Program (ETE) data sets to test if there were relationships between the percentage 
of aggregated species pairs and climate variability. Climate variability was calcu- 
lated in two ways: (1) as the standard deviation of climate within the temporal 
extent of each data set and (2) as the standard deviation of the first differences 
(changes in climate from available time-step to time-step within the temporal 
extent of a data set) of climate. We used standard deviation because it helps address 


issues with changes in data density over time. Estimated rates of change are 
sensitive to the time span over which they are measured and more closely spaced 
data would shift apparent rates of change. Approaches using standard deviation 
are less sensitive to this issue. We also compared climate variability with age (years 
before present) of ETE data sets to test for Phanerozoic-scale trends in climate 
variability sampled by ETE data sets. 

Breakpoint analysis. We used a maximum likelihood approach, available in the 
R package ‘segmented’ version 1.1, to estimate the breakpoint time at which the 
sharper decline in aggregated species pairs began. This analysis used an initial 
linear model of the proportion of aggregated pairs as a function of community 
age (logio of years before present) to generate a best-fitting number of breakpoints, 
with separate regression lines fit to each segment. A bootstrap of 1,000 replicates 
was used to estimate uncertainty in the model parameters (including uncertainty 
in the time of the breakpoint). 

Tests of artefacts. Collection modes. We thought that differences in the way fossil 
and modern data are collected might be responsible for the observed difference in 
the relative proportions of aggregated versus segregated species pairs in modern 
communities”? and fossil communities. There are two reasons why collection 
modes are not likely to be responsible for this difference. First, fossil collections 
are heterogenous by nature. Different collecting methods are used for different 
taxonomic groups (for example, bulk sampling, surface sampling, cores). Moreover, 
even within a taxonomic group, the type of depositional environment imposes 
different kinds of bias (for example, cave sites versus open pits for Pleistocene 
mammals). Second, we see a switch from species pairs that are dominated by 
aggregations to those dominated by segregations in our data sets that span the 
Pleistocene—Holocene transition (Extended Data Fig. 4 and Extended Data 
Table 1). In particular, mammal assemblages show a switch from >50% aggrega- 
tions in the Pleistocene to <50% aggregations in the Holocene. The data encom- 
passing this switch are all fossil localities and there are similar biases in both time 
slices. Although there is variation in the pollen assemblages, a weighted regres- 
sion that takes into account the sampling in each time slice shows a significant 
decrease through time (P=0.04, R’=0.15). This trend of increasing percentage 
of segregated pairs begins approximately 14,000 years ago and continues across 
the Holocene with the switch occurring in the final 1,000 year time slice”°. The fact 
that these data were all collected using the same sampling techniques suggests that 
sampling cannot account for this pattern. 

Issues of scale. It is generally assumed that fossil data are biased. Although the 
type of bias is not the same for all taxonomic groups, most fossil assemblages 
contain some degree of temporal or spatial averaging**. That is, they represent 
accumulations of species that can occur over hundreds or thousands of years and 
may mix species that did not exist at the locality at the same time’. The fossil data 
sets in this analysis include assemblages that range from no time-averaging (for 
example, fossil leaves preserved in volcanic event beds) to those that are time- 
averaged over thousands or hundreds of thousands of years (for example, some 
mammal assemblages). In addition, some data sets could not be resolved to time 
bins of less than a million years. Spatial averaging is less of an issue in these data 
sets, but individual samples are drawn from areas with diameters ranging from a 
few metres to more than 300 km (Supplementary Table 1). 

If issues of scale are contributing to the pattern found here, there should be a 
relationship between the proportion of significant pairs that are aggregated and 
the spatial or temporal scale of the data. We evaluated this by estimating the spatial 
or temporal grain and extent of each data set included in the analyses (Extended 
Data Table 1) and determining if there was a significant relationship with the per- 
centage of aggregations. The spatial grain is the estimated radius of collection 
area over which fossil specimens would have been transported to the depositional 
environment in a typical locality. The temporal grain is the typical amount of 
time-averaging of localities in a data set. Spatial extent is the longest linear distance 
between any two sites in a data set and temporal extent is the duration from the 
oldest to youngest locality in a data set. 

We found no relationship between the scale of the data sets and the proportion 
of significant pairs that were aggregated versus segregated (Fig. 2 and Extended 
Data Fig. 5). Regression analyses were not significant and explained very little of 
the variation in the data (Extended Data Fig. 6). The pattern of segregated versus 
aggregated pairs was not different in fossil versus modern assemblages because of 
biases related to the scale of fossil data. 

Taphonomic bias. How can taphonomy and palaeoenvironment affect species 
frequencies (richness) and spatial representation? The fossil record contains buried 
assemblages of species that were derived from living communities at different 
times in the past. Species representation (presence or absence) in individual fossil 
assemblages is a critical attribute of our data sets, therefore we need to consider 
how this variable might be biased relative to original associations of species in 
communities. Taphonomic processes operate during the transition of dead remains 
into preserved samples and thus control the biological information that passes 
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from the living community into the fossil record*®. These processes act as filters 
that can alter species representation in fossil samples in a variety of ways: (1) selec- 
tive preservation of organisms with particular body compositions and sizes, for 
example organisms with and without mineralized skeletons, larger versus smaller 
individuals; (2) variable preservation of organisms depending on their popula- 
tion abundance, spatial distribution and life habits, for example aquatic versus 
terrestrial; (3) post-mortem or depositional mixing of species that did not live 
together (time-averaging), or separation of species that did (selective transport 
or destruction). Additionally, some types of environment are better represented 
in the depositional record than others, such as wetlands versus dry land surfaces. 
All of these add up to potential biases that could affect biological signals and the 
proportions of random versus significant species pairs, or the proportions of seg- 
regated versus aggregated pairs, in our analyses. 

However, the particular null model algorithm used effectively controls for major 
sources of taphonomic bias in the data set. This ‘fixed-fixed’ algorithm* creates 
null assemblages that have the same species richness per sample, and the same 
number of occurrences per species, as the original data. Thus, if there are preser- 
vation biases that generate heterogeneity in the total number of fossil species per 
sample, or biases in the number of specimens per species, these are effectively 
controlled for in the analysis. Significant patterns of species aggregation are those 
measured beyond the effects of sampling heterogeneity in the occurrences of spe- 
cies or the number of species per sample. Similar sampling effects are controlled 
for in the modern data, which can also exhibit variation in the commonness or 
rarity of species and in the number of species per sample. 

Taxonomic resolution of the data. Fossils are not always resolvable to the 
species level and are frequently analysed at the genus level. This may have the 
effect of increasing geographical ranges and overlap between taxa, and may 
contribute to the dominance of aggregated pairs found in this study. To test 
whether this was the case, we analysed 18 of the data sets at the species and 
genus level (16 mammal and 2 plant data sets). If taxonomic resolution is driving 
the pattern, we expect to see an increase in the proportion of aggregated pairs 
when species are lumped into genera. We found that six of the data sets showed 
the expected increase. However, nine showed a decrease and three showed no 
change (Extended Data Table 2). Interestingly, one of the modern data sets on 
small mammals from the Great Basin had genetic information that indicated 
that some were cryptic species. When the analysis was re-run with the cryp- 
tic species identified, there was an increase in the proportion of significantly 
aggregated pairs (from 50% to 61%). This is in the opposite direction that we 
would expect if lumping species into genera artificially increased aggregated 
pairs. Taken together, these results suggest that the differences between species 
associations over the past 300 Myr and the present are not driven by the taxo- 
nomic resolution of fossil data. 
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Sampling of abundant and rare species in fossil and modern data. The results of 
null model analyses of abundance versus presence—absence data are compared in 
ref. 10. The two kinds of analysis give qualitatively comparable results, although the 
abundance analyses are somewhat more powerful in detecting non-randomness. 
It is generally assumed that fossil deposits miss the rarest species in a commu- 
nity because preservation potential increases with abundance; more individuals 
means more opportunities for fossilization events. If rare species are more likely to 
form segregated pairs, we would expect to see more segregations in modern data 
sets because they should sample more of the rare species than comparable fossil 
data sets. Within fossil data sets, we would expect to see more segregated pairs in 
data sets with better sampling and more rare species. We evaluated this using a data 
visualization technique. We present the results of our analyses as a series of pairwise 
species by species matrices and order species by occupancy (see Supplementary 
Information: data sets). Occupancy decreases as one moves to the right on the 
x axis and up on the y axis. Species with the highest occupancy are close to the 
origin. The pairwise associations are colour-coded: grey for random pairs, blue for 
aggregated pairs, and red for segregated pairs. If increasing sampling of rare species 
is responsible for the pattern we document, then we would expect to see a prepon- 
derance of red, segregated pairs in the upper, right-hand corner of the species by 
species matrices. In particular, this should show up in data sets with better sampling 
and those that encompass the shift from more aggregated to more segregated pairs 
(for example, Pleistocene-Holocene mammals and pollen, modern mammals in 
Kenya, and modern plants in Wisconsin). This is not the pattern that we see. In fact, 
we find that segregated pairs tend to form with species of intermediate occupancy 
and that aggregated pairs form both with common species and with rare species. 
Differences in the sampling of rare species between fossil and modern data sets 
cannot account for the shift in species associations over time. 
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Extended Data Figure 1 | Map showing distribution of fossil data sets. c61lad8ab017d49e1a82f580ee1298931. ArcGIS and ArcMap are the 
Polygons enclose the localities for each fossil data set included in our intellectual property of Esri and are used herein under license. Copyright 
analyses. Mammals are in blue, plants are in green. Dark colours represent © Esri. All rights reserved. For more information about Esri software, 
data sets that are older. This map was created using ArcGIS software by please visit http://www.esri.com. 


Esri and can be found at http://www.arcgis.com/home/item.html?id= 
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Extended Data Figure 2 | Breakpoint analysis of the composite data. the mean and confidence intervals around the change in slope of the two 
The analysis was performed on all data including the islands (see Fig. 3 resulting linear models. The breakpoint analysis was run using all the 
main text), showing the mean estimate (red point; 10*’”8 years) and 95% data resolved to the best possible dates to allow for the greatest amount of 
confidence interval (error bar at base of plot; 101°, 10°-*°! years) of the power in detecting where the switch occurred. However, the results were 
initiation of reduced percentage of aggregated species pairs, as well as similar when island data were excluded. 
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Extended Data Figure 3 | Weighted Loess curve with and without Australia (dark grey), South America (dark blue), Africa (orange). 
modern data. Loess curve weighted by number of sites with shaded Point shapes indicate type of data: pollen (square), mammals (triangle), 
95% confidence intervals illustrates the reduction in the proportion of macroplants (circle). Data on terrestrial communities from ref. 2 are 
aggregated species pairs towards the present. Data are analysed with (black diamonds. Only mainland assemblages were included in the calculation 
line and shading) and without (red line and shading) the modern data. for the weighted Loess curve and the density plots here and in Fig. 1. 


Colours indicate continent: North America (green), Eurasia (purple), 
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Extended Data Figure 4 | Results of PAIRS analyses of two Pleistocene- 
Holocene fossil data sets. a, Mammal data for three periods: late 
Pleistocene (40,000-10,000 years ago), Holocene (10,000-500 years ago), 
and modern (present, literature survey data). Note the switch from >50% 
aggregated pairs to <50% aggregated pairs occurs in the Pleistocene versus 
Holocene data sets. b, Results for large and small mammals separately. 
There is a significant difference (P < 0.001) between the Holocene and the 
Pleistocene for all mammals (blue bars) and for large mammals (purple 
bars) only (P= 0.015). However, the direction of the shift was different. 
For all mammals, there were fewer positive associations in the Holocene, 
whereas, for large mammals only, there were more positive associations in 
the Holocene. c, North American pollen data from the past 21,000 years 


(modified from ref. 20). Data are from cores resolved into 1,000-year time 
slices. The size of the circle is related to the number of sites in the data set. 
The point at 0 represents a period from the present to 1,000 years ago, but 
is sampled from the top of the pollen cores using the same methodology 
as the older time slices. Note the trend of decreasing percentage of 
aggregations towards the present, especially in times with the largest 
numbers of sites (after 14,000 years). A weighted regression that takes into 
account the number of sites in each time slice is significant (dashed green 
line; P= 0.04, adjusted R*=0.15). The final time slice at 0 records a shift 
from a dominance of aggregated pairs to a dominance of segregated pairs. 
The sampling methods and data structure are the same for all time slices. 
Grey dashed line is at 50% in each panel. 
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Extended Data Figure 5 | Relationship between the proportion of plotted as a function of spatial (a, b) or temporal (c, d) grain (b, d) or 
aggregated pairs and scale. The proportion of significant pairs that are extent (a, c). Linear regressions are non-significant and adjusted R? values 
aggregated is not the result of temporal or spatial scale of data. Arcsine are extremely low. 


transformation of the proportion of significant pairs that are aggregated 
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Extended Data Figure 6 | Climate variability measured during the 
temporal extents of the fossil data sets. Proportion of significant pairs 
that are aggregated shows no relationship with climate variability within 
a time interval. a, b, Climate variability was quantified as the standard 
deviation of all climate proxy data for that time interval (a), or the 
standard deviation of the first differences in climate across the interval (b). 
c, d, Climate variability (standard deviation of first differences) and 
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age of data sets show no relationship (c), suggesting no trend in climate 
variability sampled by the fossil data sets across the Phanerozoic. There 
is a significant, but weak, positive relationship (d, dashed line) between 
climate variability and decreasing age of the data sets when the linear 


model is weighted by sample size of climate proxy data enveloped by the 
temporal window of the fossil data sets (P= 0.007, adjusted R* = 0.0998). 
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Extended Data Figure 7 | Relationship between proportion of sets themselves may potentially not account for important (and possibly 
aggregated pairs and fixed-width time intervals. High-amplitude ecologically significant) climatic variability from the previous millennia. 
Pleistocene climate variability oscillating between glacial and interglacial To incorporate this possibility, we re-analysed the relationship between 
cycles may have imposed its own novel pressures on floral and faunal the proportion of aggregated species pairs and climate variability of each 
communities. Furthermore, ecological impacts may lag behind climate data set, but included climate across the preceding 100,000 years, 10,000 
episodes themselves, complicating efforts to quantify climatic links to years (not shown), and 1,000 years (not shown). As in the more restrictive 
changes in the proportion of aggregated species pairs over time. Thus, analysis (Fig. 3a), there is consistently no relationship between climate 


limiting our measure of climate variability to the temporal span of the data _ variability and the proportion of aggregated species pairs. 
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Extended Data Table 1 | Raw data for Fig. 1 


Dataset Data Owner Type Cont #Rand #Agg #Seg #Spp #Sites Age (yr) Temp Temp Spat Spat 
Grain Extent Grain Extent 
(yr) (yr) (km) (km) 
Turkana-Natoo Behrensmeyer M AF 253 0 0 23 10 1365000 200000 250000 10 80 
Turkana-Okote Behrensmeyer M AF 1081 0 () 47 16 1470000 200000 250000 10 80 
Turkana-Kaitio Behrensmeyer M AF 351 0 0 27 8 1650000 200000 250000 10 80 
Turkana-KBS Behrensmeyer M AF 2015 0 1 64 37 1715000 200000 250000 10 80 
Turkana-Kalochoro Behrensmeyer M AF 276 0 0 24 rd 2100000 200000 250000 10 80 
Turkana-Burgi Behrensmeyer M AF 1533 0 ry 56 23 2250000 200000 250000 10 80 
Turkana-Lomekwi Behrensmeyer M AF 990 0 ie} 45 10 2985000 200000 250000 10 80 
Turkana-TuluBor Behrensmeyer M AF 778 0 2 40 12 3035000 200000 250000 10 80 
Turkana-LokochotMoiti Behrensmeyer M AF 496 0 0 32 7 3705000 200000 250000 10 80 
Siwalik-3.2-3.6Ma Behrensmeyer M EA 210 0 0 2A 16 3300000 3000 1500000 5 80 
Siwalik-6-5Ma Behrensmeyer M EA 820 0 0 41 30 6300000 3000 1500000 5 80 
Siwalik-7Ma Behrensmeyer M EA 4460 4 1 95 111 7600000 3000 1500000 5 80 
Siwalik-8Ma Behrensmeyer M EA 5050 ie) ie) 101 170 8500000 3000 1500000 5 80 
Siwalik-9Ma Behrensmeyer M EA 4186 0 0 92 177 9400000 3000 1500000 5 80 
Siwalik-11-10Ma Behrensmeyer M EA 10703 4 24 147 92 11000000 3000 1500000 5 80 
Siwalik-13-12Ma Behrensmeyer M EA 8376 6 3 130 168 13100000 3000 1500000 5 80 
Siwalik-15-14Ma Behrensmeyer M EA 4454 7 4 95 38 14500000 3000 1500000 5 80 
Siwalik-17-16Ma Behrensmeyer M EA 2210 0 1 67 20 17100000 3000 1500000 5 80 
Eurasia-spp-MN17 Eronen M EA 3644 9 2 86 32 2750000 650000 650000 300 4500 
Eurasia-spp-MN16 Eronen M EA 15545 3000000 800000 800000 300 4500 
Eurasia-spp-MN15 Eronen M EA 5993 2 0 110 27 3800000 800000 800000 300 4500 
Eurasia-spp-MN14 Eronen M EA 2145 0 ie) 66 28 4750000 1100000 1100000 300 4500 
Eurasia-spp-MN11 Eronen M EA 14532 8450000 800000 800000 300 4500 
Eurasia-spp-MN10 Eronen M EA 8514 1 0 131 42 9250000 500000 500000 300 4500 
Eurasia-spp-MNO7 Eronen M EA 9977 30 4 142 65 11850000 1300000 1300000 300 4500 
Eurasia-spp-MNO6 Eronen M EA 9581 10 0 139 87 13850000 2700000 2700000 300 4500 
Eurasia-spp-MNO5 Eronen M EA 14119 51 26 169 96 16100000 1800000 1800000 300 4500 
Eurasia-spp-MN04 Eronen M EA 7994 0 127 65 17500000 1000000 1000000 300 4500 
Eurasia-spp-MNO3 19000000 | 2000000 _| 2000000 300 4500 
Eurasia-spp-MNO2 0 1 83 35 21400000 2800000 2800000 300 4500 
SAfr-LgMammals-MIS1 Faith M AF 1091 25 12 48 103 12000 1000 12000 10 600 
‘SAfr-LgMammals-MIS2 19 21 44 28 24000 2000 12000 10 600 
NAm-mammals-Modern Lyons M NA 9267 541 raed 146 67 30 50 50 50 5000 
NAm-mammals-Holo Lyons M NA 9860 404 467 147 214 5000 2000 10000 10 5000 
NAm-mammals-Pleist Lyons M NA 12727 667 467 167 88 25000 12500 30000 10 4500 
crypticTaxaDesertRodents Riddle M NA 1832 39 20 62 171 7.5 0.003 10 0.1 2000 
African-SPP-MOD Toth M AF 15795 84 179 14 30 50 50 25 800 
African-SPP-HIST Toth M AF 15632 85 178 14 90 50 50 25 800 
Wl-overstory-SPP-1950 Waller PI NA 512 26 23 34 168 60 1 4 0.0005 475 
Wl-understory-SPP-1950 Waller PI NA 12726 596 706 168 152 60 1 4 0.0005 475 
Wl-overstory-SPP-2000 33 20 39 168 10 1 4 0.0005 475 
Wl-understory-SPP-2000 Waller PI NA 12455 400 511 164 152 10 1 4 0.0005 475 
Abo DiMichele PI NA 934 9 3 44 207 290000000 55 16000000 0.07 450 
PennWolf DiMic! 2 0 54 28 300000000 500 5000000 0.07 80 
Rotliegend DiMichele PI EA 988 2 0 45 62 300000000 1000 6000000 0.07 72 
Calhoun Coal DiMichele Pl NA 378 0 0 28 55 304000000 500 10000 0.04 50 
Kootenai Jud Pl NA 494 0 2 32. 17 115000000 500 1000 0.01 15 
MoltenoS Labandeira PI AF 3655 0 0 94 8 234000000 3000 250000 0.5 200 
Molteno4 Labandeira PI AF 2278 0 0 68 12 234500000 3000 250000 0.5 200 
Molteno3 Labandeira PI AF 8778 (e) (e) 133 20 235000000 3000 250000 0.5 200 
Molteno2 Labandeira PI AF 6312 9 7. 113 43 235500000 3000 250000 0.5 200 
Molteno1 Labandeira PI AF 8256 0 0 129 9 236000000 3000 250000 0.5 200 
Burgersdorp Labandeira PI AF 300 0 0 25 8 244000000 3000 250000 0.5 200 
EarlyEocenePlants Wing PI NA 3086 99 55 81 78 55600000 500 460000 0.05 150 
LatePaleocenePlants Wing PI NA 2101 32 12 66 42 56200000 500 460000 0.05 150 
BigCedarRidgePlants Wing PI NA 8238 15 3 129 100 73000000 0 () 0.005 4 
pollenPG Bercovici Po NA 3808 20 0 88 150 66027000 2865 68000 25 55 
pollenK Bercovici Po NA 4041 49 5 91 71 66065000 825 73000 15 55 
NorthAm-pollen-Oka Blois Po NA 3501 60 500 1000 50 2645.993 
NorthAm-pollen-1ka Blois Po NA 3522 20 21 85 445 1000 500 1000 50 2645.993 
NorthAm-pollen-2ka Blois Po NA 3541 19 10 85 438 2000 500 1000 50 2645.993 
NorthAm-pollen-3ka Blois Po NA 3286 21 14 82 410 3000 500 1000 50 2645.993 
NorthAm-pollen-4ka Blois Po NA 3285 27 9 82 397 4000 500 1000 50 2625.521 
NorthAm-pollen-5ka Blois Po NA 3202 25 13 81 397 5000 500 1000 50 2625.521 
NorthAm-pollen-6ka Blois Po NA 3472 10 4 84 354 6000 500 1000 50 2625.521 
NorthAm-pollen-7ka Blois Po NA 3141 16 3 80 335 7000 500 1000 50 2625.521 
NorthAm-pollen-8ka Blois Po NA 2899 25 2 77 300 8000 500 1000 50 2623.018 
NorthAm-pollen-9ka 9000 500 1000 50 2623.018 
NorthAm-pollen-10ka Blois Po NA 2528 21 7 72 251 10000 500 1000 50 2554.072 
NorthAm-pollen-11ka Blois Po NA 2466 17 2 71 205 11000 500 1000 50 2554.072 
NorthAm-pollen-12ka Blois Po NA 2542 13 1 72 155 12000 500 1000 50 2554.072 
NorthAm-pollen-13ka Blois Po NA 2542 12 2 72 117 13000 500 1000 50 2532.103 
NorthAm-pollen-14ka Blois Po NA 2200 8 3 67 76 14000 500 1000 50 2532.103 
NorthAm-pollen-15ka Blois Po NA 1946 3 4 63 50 15000 500 1000 50 2532.103 
NorthAm-pollen-16ka Blois Po NA 1826 16000 500 1000 50 2532.103 
NorthAm-pollen-17ka Blois Po NA 1480 1 4 55 30 17000 500 1000 50 2532.103 
NorthAm-pollen-18ka Blois Po NA 1324 id 1 52 21 18000 500 1000 50 2532.103 
NorthAm-pollen-19ka 19000 500 1000 50 2531.385 
NorthAm-pollen-20ka Blois Po NA 1481 0 4 55 15 20000 500 1000 50 1481.922 
NorthAm-pollen-21ka Blois Po NA 1176 0 0 49 13 21000 500 1000 50 2134.789 
Numbers of aggregated versus segregated pairs and spatial and temporal scale of the ETE data sets included in this analysis. M, mammals; PI, macroplants; Po, pollen. AF, Africa; EA, Eurasia; NA, 
North America. #Rand, the number of taxon pairs that were not significantly different from random. #Agg, the number of significant taxon pairs that were aggregated. #Seg, the number of significant 
axon pairs that were segregated. #Spp, the number of species in the data set. #Sites, the number of sites in the data set. Age (yr) is the midpoint age of the data set. Temp Grain (yr), temporal grain in 


years or the average amount of time encompassed by a site in the data set. Temp Extent (yr), the maximum amount of time encompassed by a data set. Spat Grain (km), the average distance froma 
site that fossils were transported. Spat Extent (km), the maximum linear distance encompassed by the data set. 
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Extended Data Table 2 | Effect of taxonomic resolution 


Dataset % Aggregations for % Aggregations for Direction of difference 

Species Genera when lumpin. 
WI Overstory Plants-1950 0.531 0.800 Increase 
WI Overstory Plants-2000 0.622 0.773 Increase 

Eurasia mammals MNO2 0 0 No change 
Eurasia mammals MNO3 0.815 0.732 Decrease 
Eurasia mammals MNO4 1 0.75 Decrease 
Eurasia mammals MNO5 0.662 0.608 Decrease 
Eurasia mammals MNO6 1 0.75 Decrease 
Eurasia mammals MNO7 0.882 0.864 Increase 
Eurasia mammals MNOS 1 0.75 Decrease 
Eurasia mammals MN10 0.667 0.857 Increase 
Eurasia mammals MN11 NA 1 Increase 

Eurasia mammals MN14 NA NA No change 

Eurasia mammals MN15 1 1 No change 
Eurasia mammals MN16 0.806 1 Increase 
Eurasia mammals MN17 0.818 0.619 Decrease 
Africa mammals - historical 0.298 0.028 Decrease 
Africa mammals — modern 0.382 0.031 Decrease 
Great Basin Rodents Cryptic 0.661 0.500 Decrease 


Change in the proportion of significant pairs when data sets are analysed at the species and genus levels. If lower taxonomic resolution of fossil data sets is driving the pattern of increased aggregations 
in the fossil data, we would expect to see increases in the percentage of aggregations when data are analysed at the genus level. Instead, most data sets show a decrease in the percentage of aggre- 
gated pairs. Only 6 of the 18 data sets analysed at multiple taxonomic resolutions show the expected increase. Nine show a decrease and three show no change. One data set (Great Basin Rodents 
Cryptic) was analysed at the species level and then taxonomically resolved with genetic data to include cryptic species. For that data set only, ‘% aggregations for genera’ corresponds to the data set 
with cryptic species lumped and ‘% aggregations for species’ corresponds to the data set with cryptic species resolved. 
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Extended Data Table 3 | Proportion of aggregated pairs across critical intervals 


Taxon 


Large 
Mammals 
Mammals 


Pollen 
Plants 


Pollen 
Pollen 


Event 


Pleistocene-Holocene 
Transition 
Pleistocene-Holocene 
Transition 
Pleistocene-Holocene 
Transition 

PETM 

K-Pg 

Permian Crises 


Place 


South Africa 


North America 


North America 


North America 


North America 
Greenland 


Trend in % Positive 
Associations 


Increase 
Decrease 
Decrease 
No Change 


No Change 
No significant pairs 


LETTER 


p value 
0.007 


<0.001 


Significance of change in positive versus negative associations across critical intervals. 
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Influence of extreme weather disasters on global 


crop production 


Corey Lesk!, Pedram Rowhani? & Navin Ramankutty!? 


In recent years, several extreme weather disasters have partially or 
completely damaged regional crop production! *. While detailed 
regional accounts of the effects of extreme weather disasters exist, 
the global scale effects of droughts, floods and extreme temperature 
on crop production are yet to be quantified. Here we estimate for the 
first time, to our knowledge, national cereal production losses across 
the globe resulting from reported extreme weather disasters during 
1964-2007. We show that droughts and extreme heat significantly 
reduced national cereal production by 9-10%, whereas our analysis 
could not identify an effect from floods and extreme cold in the 
national data. Analysing the underlying processes, we find that 
production losses due to droughts were associated with a reduction 
in both harvested area and yields, whereas extreme heat mainly 
decreased cereal yields. Furthermore, the results highlight ~7% 
greater production damage from more recent droughts and 8-11% 
more damage in developed countries than in developing ones. Our 
findings may help to guide agricultural priorities in international 
disaster risk reduction and adaptation efforts. 

In many regions of the world, there have been considerable changes 
in the nature of droughts, floods and extreme temperature events since 
the middle of the twentieth century®*. Over agricultural areas, disasters 
arising from extreme weather can cause marked damage to crops and 
food system infrastructure, with the potential to destabilize food sys- 
tems and threaten local to global food security. In recent years, nearly 
one-quarter of all damage and losses from climate-related disasters has 
been in the agricultural sector in developing countries”. With such dis- 
asters expected to become more common in the future’, policymak- 
ers need robust scientific information to develop effective disaster risk 
management and adaptation interventions (for example, infrastructure, 
technology, management and insurance) to protect the most vulnerable 
populations and to ensure global food security. 

Whether an extreme weather event results in a disaster depends not 
only on the severity of the event itself, but also on the vulnerability and 
exposure of the human and natural systems that experience it®. Past 
research has addressed agricultural effects of specific weather extremes 
with fixed definitions, such as degree days above some threshold!*!>. 
This approach probably underestimates the crop effects of extreme 
weather disasters (EWDs), because similar extreme weather events may 
have differing effects depending on the vulnerability of the exposed 
system. 

In this study, we address this bias by using a disaster data set com- 
piled based on human impact. In addition, we attend to two further 
limitations of previous work on extreme weather and agriculture. First, 
several regional empirical studies have highlighted the adverse effects 
of extreme heat events on crop yields!*"'3, and global modelling efforts 
have estimated future crop yield declines due to increasing extreme 
heat stress'*!°, But this emphasis on crop yields offers an incomplete 
picture of crop production because of the potential for compensation 
or compounding of yield impacts by changes in harvested area!®; 
and because crop production (and not yields)—together with access 


and utilization—determines food security?717-18, Second, we seek 
to investigate the agricultural effects of often-overlooked extreme 
weather events, namely floods and extreme cold**. Thus, our study is 
the first, to our knowledge, that takes an empirical approach to estimat- 
ing the influence of EWDs on crop area, yields and production at the 
global scale. 

We use a statistical method, superposed epoch analysis~ (also known 
as compositing, see Methods), to estimate average national per-disaster 
cereal production losses (Food and Agriculture Organization of the 
United Nations (FAO), http://faostat3.fao.org) across the globe due 
to reported droughts, floods and extreme temperature disasters from 
1964 to 2007. Furthermore, we estimate the effects on cereal yield and 
harvested area separately to identify processes leading to production 
losses. On the basis of ~2,800 reported extreme hydro-meteorological 
disasters collated by the Emergency Events Database (EM-DAT, http:// 
www.emdat.be/database), we find that national cereal production 
during a drought was significantly reduced by 10.1% on average (95% 
confidence interval 9.9- 10.2%), while years with extreme heat led to 
national production deficits of 9.1% (8.4-9.5%; Fig. 1a, b and Extended 
Data Table 1). These production deficits were equivalent to roughly 
6 years of production growth; however, no significant lasting effects 
were noted in the years after the disasters. Estimated mean production 
losses were driven mainly by a preponderance of disasters with moder- 
ate effects on crops, as opposed to a few extreme cases (Extended Data 
Fig. 1), and were not strongly influenced by sample size (see Extended 
Data Fig. 2, Extended Data Table 2 and Supplementary Discussion). 

During 1964-2007, these estimated EWD effects represent a loss of 
1,820 million Mg due to droughts (approximately equal to the global 
maize and wheat production in 2013), and 1,190 million Mg due to 
extreme heat disasters (more than the global 2013 maize harvest). Over 
2000-2007 (the period with the most complete disaster reporting com- 
pared with earlier decades), 6.2% of total global cereal production was 
lost due to EWDs relative to an estimated counterfactual global produc- 
tion without EWD effects (3.0% to extreme heat and 3.2% to drought). 

Cereal yield declines during EWDs were 5.1% (4.9-5.2%) and 7.6% 
(7.0-8.1%) for drought and extreme heat, respectively (Fig. 2a). The 
harvested area dropped 4.1% (4.0-4.3%) during droughts, but was not 
significantly affected by extreme heat (Fig. 2b). This may be due to the 
shorter duration of extreme heat relative to droughts—while approx- 
imately one-third of droughts in this study spanned several years, all 
extreme heat disasters took place within a single year. Droughts may 
thus be more likely to last long enough to cause complete crop failure 
and discourage planting, while extreme heat disasters, especially out- 
side key crop developmental stages, may affect crop growth and reduce 
yields without critically damaging harvests. 

Our estimated yield deficits from EWDs cannot be directly com- 
pared to previous studies of the impact of seasonal mean climate trends 
over the same period” (see Supplementary Discussion). However, we 
derived a comparable measure to that reported previously”', and esti- 
mated a yield sensitivity of 6-7% per 1 °C increase in seasonal mean 
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Figure 1 | Influence of EWDs on national cereal production. 

a-d, Normalized production composites for drought (n= 222) 

(a), extreme heat (n = 32) (b), flood (n = 756) (c) and extreme cold 

disasters (n = 51) (d) over 7-year windows centred on the disaster year 

(blue lines). Box plots depict the distributions of 1,000 false-disaster 

control composites, with red crosses denoting extreme outliers, and red 


weather associated with extreme heat disasters, which suggests that our 
observed extreme heat effects are not necessarily independent from 
those detected in studies examining changes in seasonal temperatures 
(Extended Data Fig. 3). Methodological differences and uncertainties 
prevent us from drawing strong conclusions based on this comparison. 
Our drought impacts, however, seem to be independent of previous 
estimates that used seasonal weather anomalies (see Supplementary 
Discussion). 

Our results do not show significant production effects from extreme 
cold and floods (Fig. 1c, d). A potential explanation for this is that 
floods tend to occur in the spring in temperate regions as a result of 
snowmelt, and cold weather susceptibility in most agricultural regions 
is highest outside the growing season, which may render a sizeable 
portion of the flood and extreme cold disasters analysed in this study 
agriculturally irrelevant. The estimated lack of response may also be 
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Figure 2 | Influence of EWDs on national cereal yields and harvested 
area. a, b, Yield (blue) and harvested area (red) composites for drought 
(n= 222) (a) and extreme heat (n = 32) (b), with significant points (those 
lying beyond the control box plot whiskers) marked by asterisks (box plots 
not shown for clarity). Drought was associated with significant deficits 

in both yield and harvested area (5.1 and 4.1%), whereas extreme heat 
revealed only significant yield impacts of 7.6% with no significant effect on 
harvested area. 


Year from event 


dashes denoting medians. Production during drought and extreme heat 
years was 10.1% and 9.1% below the control mean, respectively, whereas 
no significant production signal was detected for floods or extreme cold. 
Production resumed normal levels immediately after drought and extreme 
heat. The increasing trend in production over the 7-year window reflects 
the observed growth trend. 


an artefact of the spatial dimension of these disasters. While drought 
and extreme temperature affect broad regions, floods are a function 
of both weather and topography, and can be highly localized within 
a country”. Since this study uses country-level agricultural statistics, 
one may speculate that a more noticeable flood effect on sub-national 
production is masked at the national scale. 

Several additional analyses offer more detailed insights into the 
effects of these EWDs on cereal production. Cereals in the more tech- 
nically developed agricultural systems of North America, Europe and 
Australasia suffered most from droughts, facing on average a 19.9% 
production deficit compared to 12.1% in Asia, 9.2% in Africa, and no 
significant effect in Latin America and the Caribbean (overall differ- 
ence in means P= 0.02; Fig. 3a and Extended Data Tables 3 and 4). This 
more severe production impact in the developed nations was driven 
by a substantial yield deficit of 15.9%, with no significant reduction in 
harvested area (Fig. 3b, c). We see three possible explanations for this 
pattern. First, it may arise from a tendency among lower-income coun- 
tries to encompass diverse crops and management across many small 
fields, which may allow for some fields to resist drought better than 
others. This might reduce the national drought sensitivity compared 
to higher-income countries, where large-scale monocultures are more 
dominant. Second, lower-income countries may better resist drought 
because smallholders tend to use risk-minimizing strategies compared 
to the yield-maximizing ones prevalent in higher-income countries. 
Third, the pattern may relate to generally lower fair-weather yields in 
lower-income countries. In Asia, we found a significant reduction of 
8.8% in harvested area during droughts with no corresponding yield 
deficit, suggesting that this region has a greater tendency for total crop 
failure in the event of a drought rather than harvesting with reduced 
yields’. The production effects in Africa did not correspond to signif- 
icant deficits in either yield or harvested area. 

While the production of all three crops was similarly affected by 
droughts (5-6% deficit each; Fig. 4a, Extended Data Tables 2 and 5), 
only maize was significantly affected by extreme heat (11.7% deficit, 
P=0.01) (Fig. 4d). Maize was also the only crop with significant yield 
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Figure 3 | A regional analysis of the influence of drought. a-c, Regional 
composites of production (a), yield (b) and harvested area (c) for 
drought, with significant points (those lying beyond the control box plot 
whiskers) marked by asterisks (box plots not shown for clarity). P values 
reflect significant differences between regions in drought-year response 
(Kruskal-Wallis test). The drought-year normalized production is 7.8% 


effects (12.4%, P=0.002) (Fig. 4b, e). We are hesitant to draw strong 
conclusions based on this difference, as it may be due to differing var- 
iance as well as mean (see Extended Data Table 6 and Supplementary 
Discussion). Furthermore, it may reflect the fact that maize is generally 
grown during summer months, which have the highest probabilities of 
extreme heat as defined in EM-DAT, while wheat is grown during the 
spring. Disaster data with monthly or daily resolution would enable us 
to investigate whether this apparent susceptibility of maize is a result 
of differing growing season. 

Finally, more recent droughts (1985-2007) caused cereal produc- 
tion losses averaging 13.7%, greater than the estimated 6.7% during 
earlier droughts (1964-1984) (P=0.008, Fig. 5), which may be due 
to any combination of rising drought severity (although whether 
drought severity has increased globally is presently debated)**~*%, 
increasing vulnerability®’’ and exposure to drought®, and/or chang- 
ing reporting dynamics (Extended Data Fig. 4). Sample size limita- 
tions prevented us from repeating a regional and temporal analysis 
for extreme heat. 

Some limitations of our analyses are worth noting. First, we mainly 
focus on four principal types of EWDs, but follow-up studies should 
include tropical storms and extreme precipitation and wind events, 
especially since they may have an increasingly important effect on 
agriculture in the context of climate change”®. Second, our estimates 
are biased towards more recent disasters as they are more abundantly 
reported in EM-DAT than older ones (see Extended Data Fig. 4 and 
Supplementary Discussion). Third, we use EWDs from the EM-DAT 
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and 10.7% lower (P= 0.02) in developed Western countries (n = 28) 
than in Asia (n = 32) and Africa (n= 125) (a), a difference driven by a 
significantly greater yield deficit (P= 0.002) (b). Meanwhile, the Latin 
America (L.Am) and Caribbean (Carib.) region (n = 37) exhibits no 
significant response to drought. Aus., Australasia; Eur, Europe; N.Am., 
North America. 


database, which collates disasters based on several criteria for sub- 
stantial human impact (Methods). We may be underestimating the 
true effects of EWDs if disasters are included mainly based on urban 
impacts, or if extreme events occurring in sparsely populated areas are 
less likely to qualify as disasters. Finally, since we observe agricultural 
impacts at the national level, more notable local and regional effects of 
disasters may be muted (but conversely, finding a signal at the national 
level highlights the substantial influence of droughts and extreme heat). 
Future studies may arrive at a more detailed estimate by using subna- 
tional agricultural data, localizing the reported disasters within nations, 
selecting disasters taking place during the growing season, and con- 
trolling for severity of disasters. Linking the definitions of EWDs used 
in this study with statistical meteorological definitions will also enable 
a forecasting of future impacts. 

Overall, there are four main conclusions from our study. First, over 
the period 1964-2007, drought and extreme heat substantially damaged 
national agricultural production across the globe. Within the frame- 
work of this study, no effect on agriculture was identified from floods 
and extreme cold. Second, drought reduced cereal yield and completely 
damaged crops, whereas extreme heat only affected yield, reflecting 
clear differences in the processes leading to overall production effects. 
Third, this study highlights an important temporal dimension to these 
impacts. While the damage to cereal production is considerable, this 
effect is only short term, as agricultural output rebounds and continues 
its growth trend after the disaster. Furthermore, we show that recent 
droughts had a larger effect on cereal production than earlier ones. 
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Figure 4 | The influence of drought and extreme heat on maize, rice and 
wheat. a-f, Drought and extreme heat composites of production (a, d), 
yield (b, e) and harvested area (c, f) for maize (blue), rice (red) and wheat 
(green), with significant points (those lying beyond the control box plot 
whiskers) marked by asterisks (box plots not shown for clarity). P values 
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reflect significant differences between crops in disaster-year response 
(Kruskal-Wallis test). Maize production (n = 28) responds more (P= 0.01) 
to extreme heat than wheat (n = 32) and rice (n= 16), an effect driven by a 
substantial yield deficit (P= 0.002). For drought data, maize (n = 208), rice 
(n=171) and wheat (n = 234). 
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Figure 5 | A temporal analysis of the influence of drought. 

a, b, Production composites for earlier (1964-1984, n= 126) (a) and 
later (1985-2007, n= 121) (b) droughts, with boxplots of 100 respective 
control composites. In later instances, mean drought-year production 
losses were greater (13.7%) than in earlier instances (6.7%; P = 0.008, 
Kruskal-Wallis test). 


Finally, our regional and crop-specific analysis finds that developed 
nations suffer most from these EWDs. 

Present climate projections suggest that extreme heat events will be 
increasingly common and severe in the future’. Droughts are likely 
to become more frequent in some regions, although considerable 
uncertainty persists in the projections®. This study, by highlighting 
the important historical effects of EWDs on agriculture, emphasizes 
the urgency with which the global cereal production system must 
adapt to extremes in a changing climate. Understanding the key pro- 
cesses leading to such crop losses enables an informed prioritiza- 
tion of disaster risk reduction and adaptation interventions to better 
protect the most vulnerable farming systems and the populations 
dependent on them. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Superposed epoch analysis (SEA) is used to isolate an average EWD response 
signal using time series of national agricultural production data and EWDs. SEA is 
a statistical approach that has been used to enhance the signal (that is, influence of 
particular events) in time-series data, while reducing noise due to extraneous vari- 
ables!*. The EWDs are compiled from the Emergency Events Database (EM-DAT; 
http://www.emdat.be/database) and consist of 2,184 floods, 497 droughts, 138 
extreme heat and 194 extreme cold disasters from 177 countries over the period 
1964-2007. EM-DAT collects information on a reported disaster if at least ten peo- 
ple died, a state of emergency was declared, international assistance was called, or 
at least 100 people were injured, made homeless or required immediate assistance. 
Disaster reports are gathered from various organizations including United Nations 
agencies, governments, and the International Federation of Red Cross and Red 
Crescent Societies®. The agricultural data consist of country-level total production, 
average yield, and total harvested area data for 16 cereals (http://faostat3.fao.org) 
covering the 177 countries in the set of EWDs from 1961 to 2010. 

From the time series of agricultural data, we extracted shorter sets of time series 
using a 7-year window centred on the year of occurrence of each EWD, with 3 years 
of data preceding and following each EWD. The data were normalized to the aver- 
age of the 3 years preceding and following the event to remove the absolute magni- 
tude of national data from the signal. For multi-year droughts, we averaged across 
all drought years to produce a single disaster year datum. For a 3-year drought, 
for example, the 7-year window became a 9-year window with seven data points 
(with the middle 3 years being averaged and assigned to year 0). The 7-year sets of 
EWD time series were then centred on the disaster year and averaged year-wise to 
yield single composited time-series of production, yield and harvested area for each 
EWD type (a total of 12 composited time series). The averaging thus strengthens 
the signal at the central year of EWD occurrence, while also cancelling the noise 
in the non-disaster years preceding and following the event. 

During compositing, points on individual time series co-occurring with another 
disaster in the set were excluded from the mean. This procedure resulted in variable 
sample size across the 7 years of the composites. For brevity, we have presented 
mean sample sizes across all years; complete tabulated sample sizes are displayed in 
Extended Data Tables 2 and 4. Our composited mean estimate does not seem to be 
influenced by outliers (see Extended Data Fig. 1 and Supplementary Discussion). 
The signal-to-noise strength will certainly depend on the sample size, and we 
performed an analysis to estimate the influence of sample size (see Extended Data 
Tables 2 and 4, Extended Data Fig. 2 and Supplementary Discussion). 

In addition to average per-disaster estimates, we also calculated aggregate pro- 
duction losses over specific time periods. For each extreme heat or drought, we 
first applied the average per-disaster percentage loss estimate (different values for 
extreme heat or drought) to the average national production across the six adjacent 
non-disaster years. We then computed the aggregate drought or heat-related global 


production loss for each year by summing the production losses for each disaster 
over the given time period. We estimated the percentage of global production lost 
to the EWDs relative to an estimated counterfactual global production in a world 
without EWDs (the latter being the sum of observed global production plus the 
estimated production loss). 

The significance-testing procedure involved setting up a ‘control’ estimate by 
randomly resampling the agricultural data using sets of fictitious disasters with 
randomly generated years and countries of occurrence. The fictitious EWD time 
series were averaged as for the true ones to yield composited ‘control time series, 
and the entire process was repeated 1,000 times. We quantified EWD-year deficits 
in production, yield and harvested area by subtracting the true EWD time series 
from the mean of the controls. Excluding randomly generated disasters that hap- 
pened to be real disasters systematically raised the impact estimates by ~1%; to 
present a more conservative and rigorous detection of the disaster signal, we elected 
not to exclude such pseudo-disasters. Note that we chose not to de-trend the time 
series before compositing to remove technology-driven growth, but rather simply 
estimate the disaster signal as difference from control (see Fig. 1). We estimated the 
95% confidence intervals for our point estimates of impacts using an approach sim- 
ilar to a delete-one jackknife resampling method (see Supplementary Discussion). 

The percentage significance of each estimate of the EWD composites relative 
to control was estimated as the percentage of 1,000 control points less than the 
EWD composite estimate for each year. Points with estimated significance of 
<0.5% or >99.5% were considered significant deficits and surpluses, respectively, 
corresponding to a two-tailed 99% confidence level. While we chose a two-tailed 
approach for robustness, we found no significant surpluses. The significant points 
appear as asterisks in Figs 2-4, while for Figs 1 and 5 we present the EWD compos- 
ites with the distribution of controls represented as an array of box-and-whisker 
plots for a visual representation of significance. The complete tabulated percentage 
significance values are presented in Extended Data Tables 1, 3 and 5. 

The earlier-versus-later analysis for droughts was performed by applying the 
SEA procedure to the set of droughts divided roughly equally into earlier and later 
halves. Similarly, the regional analysis was conducted by repeating SEA for full set 
of disasters divided into four regional groupings, and the by-crop composites were 
obtained by repeating SEA on the full disaster sets using crop-specific agricultural 
data from the FAO (http://faostat3.fao.org). Statistical significance of differences 
between crop-specific, regional and earlier-versus-later composites was assessed 
using the Kruskal-Wallis test. We applied a quadratic transformation to the data 
for comparison to equalize variance between groups (verified using Levene’s test), 
and used non-parametric tests when comparing groups as normal assumptions 
were not met (see Supplementary Discussion). 

Code availability. All the core programs including codes to perform superposed 
epoch analysis and the various statistics described in this paper are available on 
Github (https://github.com/nramankutty/SEA-code). 
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Extended Data Figure 1 | Distributions of individual responses to values (falling towards the right of the red shaded areas) underlies the 
drought and extreme heat. a-f, Histograms of disaster-year differences negative mean disaster year signals, with a limited influence of extreme 
from means of 1,000 resampled controls for drought (n = 222) (a-c) and cases (those at the left of the red shaded areas). 


extreme heat (n= 32) (d-f). A preponderance of moderately negative 
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Extended Data Figure 2 | The influence of sample size on estimated production deficit (9.1% for extreme heat, 10.1% for drought). Most of the 
disaster effects. a, b, Estimated mean 16-cereal aggregated production initial variability at low sample sizes dissipates into the mean at well below 
deficit for extreme heat (a) and drought (b) in 200 sub-samples with size the actual sample size (n = 39 for extreme heat, n = 247 for drought). 


of (1, 2, ..., 2) (points). Dotted grey line shows the final estimated mean 
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distributions of 1,000 false-disaster control composites, with red crosses 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


—Heat 


—— Drought 


k= 
s+] 
J) 
~ 
h_ 
i} 
a2 
”“ 
7) 
1S) 
= 
cy) 
f= 
h_ 
= 
(s) 
1S) 
°o 
ro 
° 
t= 
i} 
ne} 
£ 
Ss 
2 


Extended Data Figure 4 | Time series of the number of extreme heat and series of reported disasters per year exhibits an increasing trend, probably 
drought disasters per year from the EM-DAT database. The EM-DAT the result of more complete disaster reporting in more recent decades with a 
database is based on a compilation of disaster reports gathered from various _ possible contribution from increasing disaster incidence. There is also large 
organizations including United Nations agencies, governments and the inter-annual variability in the number of disasters. 
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Extended Data Table 1 | Statistical significance of 16-cereal aggregate analysis 


Percent Significance (% of 1000 controls < disaster) 


Year from Disaster -3 -2 -1 0 1 2 3 
Production 13.5 16.8 9.7 0 98.9 98.4 92.9 

Drought Yield 22.0 71.0 8.9 0.0 93.3 86.3 68.7 
Harvested Area 21.5 7.3 35.2 0.0 92.4 95.3 87.5 
Production 86.2 87.2 70.5 0.0 61.3 85.8 27.2 

Extreme ; 

Heat Yield 62.1 56.5 59.8 0.1 81.9 82.7 60.5 
Harvested Area 84.2 91.2 68.9 22.1 25.1 70.6 9.8 
Production 84.5 44.0 97.2 48.7 31.5 30.5 81.9 

Extreme ; 

Cold Yield 41.4 18.9 97.2 50.2 60.7 57.8 94.8 
Harvested Area 90.1 69.3 81.0 53.2 19.1 17.5 47.9 
Production 37.1 67.7 90.6 98.6 93.2 97.2 73.6 

Flood Yield 57.1 65.0 29.5 93.1 85.1 98.3 96.2 
Harvested Area 26.6 61.6 97.3 95.7 85.9 75.0 30.6 


Percentage of points on control composites less than EWD composites for 16-cereal aggregate, 1,000 control replicates total. 
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Extended Data Table 2 | Sample sizes for individual crop and 16-cereal aggregate analyses 


Drought 


Extreme Heat 
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Year from Disaster 
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16-cereal aggregate 
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16-cereal aggregate 
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-3 
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34 
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231 
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Mean 
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Extended Data Table 3 | Statistical significance of regional analysis 


Percent Significance (% of 1000 controls < disaster) 


Year from Disaster -3 -2 -1 0 4 2 3 
Africa 23.1 24.1 11.4 0.0 93.8 96.0 92.1 
: Asia 31.4 23.3 14.5 0.0 93.4 83.7 49.9 
Production 
North America, Europe, Australasia 27.8 18.9 61.4 0.0 96.3 80.9 82.2 
Latin America and Caribbean 48.2 65.0 74.0 7.6 24.2 65.4 75.4 
Africa 16.4 85.7 3.6 0.7 88.0 86.2 88.4 
Vield Asia 10.6 3.0 5.0 32.2 96.1 90.6 51.3 
North America, Europe, Australasia 73.9 18.2 71.8 0.0 85.7 40.8 68.0 
Latin America and Caribbean 42.4 71.2 90.2 2.1 24.3 75.7 38.6 
Africa 43.2 3.6 48.5 0.7 85.1 90.1 67.7 
Asia 64.2 70.3 49.0 0.0 60.6 49.5 38.8 
Harvested Area 
North America, Europe, Australasia 5.7 23.3 37.7 4.6 96.7 93.9 81.3 
Latin America and Caribbean 57.0 65.1 52.4 38.5 32.7 49.7 76.8 


Percentage of points on control composites less than EWD composites for 16-cereal aggregate by region, 1,000 control replicates total. 
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Extended Data Table 4 | Sample sizes for regional analysis 


Year from Disaster 

North America, Europe, Australasia 
Asia 

Africa 


Latin America and Caribbean 


n= 
-3 -2 -1 0 
28 27 25 34 
39 31 35 36 
117 120 144 139 
37 42 39 38 
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Extended Data Table 5 | Statistical significance of individual crop analysis 


Percent Significance (% of 1000 controls < disaster) 


Year from Disaster -3 -2 -1 0 1 2 3 
Production 3.1 44.0 82.7 0.4 97.3 77.8 70.3 
Maize Yield 7.5 73.1 76.9 0.1 59.6 53.9 72.1 
Harvested Area 7.1 24.3 84.3 19.5 97.4 80.0 54.6 
Production 53.1 27.5 1.1 0.1 45.1 94.8 96.4 
Drought —_ Rice Yield 38.0 25.9 1.6 3.0 67.0 92.0 78.1 
Harvested Area 67.7 34.7 16.1 1.0 24.4 87.7 96.6 
Production 64.9 34.6 49.0 0.2 94.4 54.2 89.4 
Wheat Yield 77.4 70.1 47.2 2.0 63.5 64.6 28.6 
Harvested Area 35.5 23.2 49.2 5.0 91.0 59.3 96.1 
Production 70.1 81.2 95.8 0.2 72.7 44.3 6.8 
Maize Yield 59.1 28.7 90.0 0.0 62.5 85.4 35.6 
Harvested Area 67.4 93.7 81.8 52.8 64.7 18.0 7.5 
ee . Production 29.1 62.5 39.3 774 68.5 86.2 80.0 
Heat Rice Yield 51.1 78.2 90.1 62.2 60.5 82.6 38.5 
Harvested Area 26.1 35.8 14.9 80.4 66.9 78.6 86.4 
Production 63.6 83.0 25.9 2.3 57.9 83.1 35.9 
Wheat Yield 60.5 90.8 34.1 4.4 75.6 63.7 49.3 
Harvested Area 64.6 51.9 28.8 10.3 35.4 79.4 25.0 


Percentage of points on control composites less than EWD composites for individual crop analysis, 1,000 control replicates total. 
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Extended Data Table 6 | Kruskal-Wallis assumptions test results for group comparison analyses 


Analysis 


Regional 


Individual Crop: Drought 


Individual Crop: Extreme Heat 
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The calcium sensor synaptotagmin 7 is required for 


synaptic facilitation 


Skyler L. Jackman!, Josef Turecek', Justine E. Belinsky' & Wade G. Regehr! 


It has been known for more than 70 years that synaptic strength is 
dynamically regulated in a use-dependent manner’. At synapses 
with a low initial release probability, closely spaced presynaptic 
action potentials can result in facilitation, a short-term form of 
enhancement in which each subsequent action potential evokes 
greater neurotransmitter release”. Facilitation can enhance 
neurotransmitter release considerably and can profoundly 
influence information transfer across synapses’, but the underlying 
mechanism remains a mystery. One proposed mechanism is that 
a specialized calcium sensor for facilitation transiently increases 
the probability of release”, and this sensor is distinct from the 
fast sensors that mediate rapid neurotransmitter release. Yet 
such a sensor has never been identified, and its very existence has 
been disputed**®. Here we show that synaptotagmin 7 (Syt7) is a 
calcium sensor that is required for facilitation at several central 
synapses. In Syt7-knockout mice, facilitation is eliminated even 
though the initial probability of release and the presynaptic residual 
calcium signals are unaltered. Expression of wild-type Syt7 in 
presynaptic neurons restored facilitation, whereas expression of 
a mutated Syt7 with a calcium-insensitive C2A domain did not. 
By revealing the role of Syt7 in synaptic facilitation, these results 
resolve a longstanding debate about a widespread form of short- 
term plasticity, and will enable future studies that may lead to a 
deeper understanding of the functional importance of facilitation. 

Several mechanisms for facilitation have been proposed (Extended 
Data Fig. 1). In the ‘buffer saturation model, high concentrations of 
presynaptic Ca”* buffer capture incoming Ca** before it binds to the 
rapid synaptotagmin isoforms (1, 2 and 9) that trigger vesicle fusion 
at most synapses’. If the Ca’* buffer saturates during the first action 
potential, more Ca”* reaches release sites during subsequent action 
potentials, producing facilitation®*, Yet many facilitating synapses lack 
sufficient presynaptic Ca”* buffer to account for this form of facilita- 
tion’. Another theory suggests that a specialized Ca”* sensor responds 
to the smaller, longer-lasting Ca?" signals between action potentials’. 
In one scenario, this sensor modulates Ca** channels to produce 
use-dependent increases in Ca?" influx!®. Several candidate proteins 
have been proposed to act in this manner!"!”, but increased Ca? 
influx cannot account for facilitation at most synapses’. Alternatively, 
an unidentified Ca** sensor could mediate facilitation by directly 
increasing the probability of release (p). 

Syt7 is located presynaptically, and binds Ca?* with high affinity 
and slow kinetics'*"'*, making it a promising candidate sensor for the 
modest increases in residual Ca”* that mediate facilitation. Previous 
studies suggest that Syt7 contributes to a slow phase of transmission 
known as asynchronous release'”!8, and to Ca”+-dependent recovery 
from depression’, but the role of Syt7 in facilitation was not exam- 
ined because these studies used synapses with prominent depression 
that obscures facilitation. We therefore examined synaptic transmis- 
sion at four facilitating synapses: Schaffer collateral synapses between 
hippocampal CA3 and CA1 pyramidal cells? (Fig. 1a), thalamocortical 
synapses between layer 6 cortical pyramidal cells and thalamic relay 


cells”? (Fig. 1b), mossy fibre synapses between dentate granule and 
CA3 cells’ (Fig. 1c), and perforant path synapses between layer II and 
III cells of the entorhinal cortex and dentate granule cells”! (Fig. 1d). 
Immunohistochemistry shows that Syt7 is present in regions where 
these synapses are located (Extended Data Figs 2 and 3). Facilitation is 
often assessed using pairs of closely spaced stimuli. In slices from wild- 
type mice, paired-pulse facilitation resulted in ~2-fold enhancement 
of neurotransmitter release lasting several hundred milliseconds (Fig. 
la-d, black traces). In Syt7-knockout mice, paired-pulse facilitation 
was eliminated (Fig. 1a—d, red traces). Sustained high frequency acti- 
vation produces up to tenfold enhancement in wild-type animals, but 
facilitation is eliminated in knockouts at all synapses except for mossy 
fibre synapses, where the remaining enhancement is consistent with 
use-dependent spike broadening that occurs at this synapse”* (Fig. 
le-h and Extended Data Fig. 4). 

The loss of facilitation in Syt7 knockouts cannot be accounted for 
by slowed recovery from depression reported with Syt7 deletion’, 
because recovery from depression is too slow to influence rapid facil- 
itation strongly, nor can it produce the large increase in release asso- 
ciated with facilitation. There are several possible explanations for the 
loss of facilitation in knockouts: (1) the presynaptic Ca** signal that 
induces facilitation could be altered, (2) the probability of release (p) 
for synaptic vesicles could be increased, which by promoting vesicle 
depletion would indirectly reduce facilitation, or (3) the mechanism 
for facilitation could be disrupted directly. We assessed these possi- 
bilities at the CA3-CA1 synapse. 

Action-potential-evoked increases in presynaptic Ca** consist of a 
large, brief localized Ca?* signal that activates the low-affinity Ca?* 
sensor synaptotagmin 1| to trigger neurotransmitter release*’, and a 
small residual Ca?t signal (Ca;es) that persists for tens of milliseconds 
and has been implicated in facilitation”. It is difficult to measure local 
Ca’* signals that trigger release, but Ca,.; is readily measured. We used 
a low-affinity Ca** indicator to measure the time course of Cayes in 
CA3 presynaptic terminals, because facilitation can be attenuated by 
the accelerated decay of Cares (ref. 4). Cares decayed similarly in wild- 
type and Syt7-knockout animals (Fig. 2a), indicating that the loss of 
facilitation in knockout mice is not a consequence of accelerated Ca,. 
decay. We also used Cares as a measure of Caingux to determine whether 
there are use-dependent changes in Ca”* entry. However, each of two 
closely spaced stimuli evoked the same incremental increase in Ca,.; in 
both wild types and knockouts (Fig. 2b), indicating that use-dependent 
changes in total Cajnqux Cannot account for facilitation. This suggests 
that if changes in Cajnfux contribute to facilitation at this synapse, they 
must be restricted to the small subset of presynaptic calcium channels 
that evoke neurotransmitter release. We repeated the experiment using 
a high-affinity Ca** indicator, in which the degree of saturation during 
paired stimuli can be used to measure the magnitude of Cayes evoked by 
the first stimulus (see Methods). We conclude that Caingux evoked by the 
first stimulus is the same in wild-type and knockout animals (Fig. 2c). 

We further explored the role of Ca?" in facilitation by examining 
the Ca?+-dependence of excitatory postsynaptic currents (EPSCs) 
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Figure 1 | Facilitation is absent in Syt7 knockout mice. 

a-d, Representative traces (top) and average paired-pulse ratio (PPR) at 
different interstimulus intervals (At) (bottom) recorded in slices prepared 
from wild-type (WT; black) and Syt7-knockout (KO; red) animals. 
Postsynaptic responses were recorded using whole-cell voltage clamp 
from hippocampal CA1 pyramidal cells (a), and thalamic relay cells (b). 
fEPSPs were recorded from hippocampal-mossy-fibre to CA3 synapses 
(c), and lateral-performant-path synapses in the dentate gyrus (d). 


and facilitation. Raising extracellular Ca** leads to a steep increase in 
EPSC amplitude (Fig. 2d) but a decrease in facilitation (Fig. 2e, black 
traces), even though high extracellular Ca?* should increase the Cayes 
available to evoke facilitation. This paradox is resolved by realizing 
that increased Ca?* influx increases p, which depletes presynaptic ves- 
icles, saturates release and limits the extent of facilitation. The Ca’*- 
dependence of EPSC amplitudes was unaffected in knockout animals 
(Fig. 2d), but facilitation was absent for all values of external Ca?* 
(Fig. 2e, f). Meanwhile, there was no difference in basal release proper- 
ties measured by the rate of spontaneous EPSCs (Extended Data Fig. 5). 
These findings suggest that the loss of facilitation in knockouts is not 
a consequence of higher initial p, because facilitation was absent even 
when the initial p was strongly attenuated by reducing external Ca?* 
To test further whether initial p is increased in Syt7 knockouts, we 
measured how field excitatory postsynaptic potentials (fEPSPs) scaled 
with stimulus intensity” (Fig. 3a). The slope of the fEPSP versus pre- 
synaptic volley gives a relative measure of p (see Methods), which was 
unchanged in knockouts (Fig. 3b). Moreover, the fEPSP to presynaptic 
volley ratio changed steeply with extracellular Ca’*, showing that this 
method is sensitive to p (Fig. 3c, d). We also assessed p using pharma- 
cological blockade of synaptically activated NMDARs (N-methyl-p- 
aspartate receptors) by the use-dependent blocker MK801 (ref. 25; 
Fig. 3e-g). This approach is widely used to detect changes in p: an 
increase in p leads to more glutamate release, and more activation and 
rapid blockade of NMDARs, while a decrease in p leads to a slower block- 
ade (Extended Data Fig. 6). The rate of blockade of NNUDAR-mediated 
fEPSPs (NMDAR-fEPSPs) was unaffected by Syt7 deletion (Fig. 3e), 
indicating similar initial p. However, when we evoked NMDAR- 
fEPSPs with trains of three stimuli*’, amplitudes decayed more rapidly 
in wild types (Fig. 3f, g), suggesting that Syt7 is required to increase p 
for the second and third stimuli. Thus, initial p and presynaptic Ca”* 


Stimulus frequency (Hz) Stimulus frequency (Hz) 


Vertical scale bars, 100 pA (a, b) and 1001V (c, d). e-h, Synaptic responses 
to 20-Hz trains from the same preparations as a-d (top), normalized 
amplitudes during 20-Hz trains (middle), and normalized responses to 

the tenth stimulus as a function of stimulus frequency (bottom). Peak 

PPR was significantly different for wild-type and Syt7-knockout mice at 
all synapses, as was response}o/response, for 5-50-Hz trains (P< 0.01, 
Student’s t-test). Data represent mean + s.e.m. Number of experiments is 
shown in Extended Data Table 1. 


signalling are unaffected by Syt7 deletion, but knockouts lack the use- 
dependent increase in p that underlies facilitation. This suggests that the 
mechanism underlying facilitation is directly impaired by Syt7 deletion. 

Syt7 is implicated in neuroendocrine release!®, insulin secretion”® 
and exocytosis of lysosomes”, which could all indirectly influence syn- 
aptic transmission in global Syt7 knockouts. Therefore, to determine 
whether Syt7 controls facilitation by acting in presynaptic neurons in 
a cell-autonomous manner, we tested whether viral expression of Syt7 
in CA3 pyramidal cells of Syt7 knockouts rescued facilitation. This 
approach is complicated by our inability to virally transduce all CA3 
pyramidal cells, which prohibits the use of extracellular stimulation 
that would activate some presynaptic cells that express Syt7 and others 
that do not. We overcame this problem with an adeno-associated virus 
(AAV) that drove bicistronic expression of both channelrhodopsin-2 
(ChR2) and Syt7, allowing optical stimulation of only those fibres 
expressing Syt7. 

Using conditions we have previously shown allow facilitation to be 
studied with optogenetic stimulation (see Methods), we confirmed 
that when ChR2 alone was expressed, optical and electrical stimulation 
produced similar facilitation in wild types (Fig. 4a, e, f), and similar 
depression in knockouts (Fig. 4b, e, f). We next used a bicistronic 
vector to express both ChR2 and wild-type Syt7 in knockout animals. 
Light-evoked responses exhibited facilitation, whereas electrically 
evoked responses did not (Fig. 4c, e, f). This suggests that bicistronic 
expression of ChR2 along with a presynaptic protein of interest offers a 
powerful new approach to characterize the effect of gene manipulation 
on presynaptic function within intact neural circuits. When Syt7 was 
expressed in wild-type animals, the peak facilitation was unaffected 
(Fig. 4e, f and Extended Data Fig. 7a). Thus, expressing Syt7 in CA3 
pyramidal cells rescued facilitation in a cell-autonomous manner, with 
facilitation restored only at synapses expressing Syt7 and ChR2. 
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Figure 2 | Facilitation is altered in Syt7-knockout animals despite 
similar presynaptic Ca?" signals. a, Presynaptic Ca,., evoked by a single 
stimulus recorded from Schaffer collateral fibres loaded with a low- 
affinity Ca** indicator (left), and Ca,.; half-decay times (right). NS, not 
significant. b, Cayes signals recorded with low-affinity indicator evoked by 
one or two stimuli (left). The ratio of the increase in Ca;es evoked by the 
first (AF,) and second (AF,) stimuli (right). c, Ca,.5 signals recorded with 
high-affinity indicator evoked by one or two stimuli. d, Average EPSC 
amplitudes for CA3-CA1 synapses recorded in different external Ca”* 
(Ca,) concentrations, normalized to the amplitude in 2mM Cag. e, EPSCs 
recorded in different Cag. Vertical scale bars, 50, 100, 200 and 300 pA in 
0.5, 1, 2 and 3mM Cag, respectively. f, PPR for interstimulus interval of 
20 ms recorded in different Ca,. In 0.5 mM Ca?*, the PPR in knockout 
(1.24 + 0.12) was not significantly different from 1 (P= 0.084, Wilcoxon 
signed rank test). Data represent mean + s.e.m. Number of experiments is 
shown in Extended Data Table 2. 
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Figure 3 | Change in the initial probability of release does not underlie 
the absence of facilitation in Syt7-knockout mice. a, Extracellular 
recordings of presynaptic fibre volley and fEPSP evoked by the indicated 
stimulus intensities. Scale bar, 200 V. b, fEPSP slope plotted against 

fibre volley amplitude, for 20-100 A stimulation. c, fEPSPs recorded 

in 1 and 3mM Ca,. Scale bar, 100,:V. d, Average ratio of the fEPSP to 

the fibre volley in different Cag. e, Top, initial release probability was 
measured by stimulating Schaffer collaterals every 10s while recording 
NMDAR-fEPSPs before and after MK801 bath application. Middle, traces 
averaged from 10 trials before (dark traces), and trials 10-15 after (light 
traces) MK801 application. Bottom, average NUDAR-fEPSP amplitudes 
evoked in the presence of MK801. f, Same as in e but with three stimuli at 
50 Hz every 30s. First response to trains is shown. g, Half-decay times of 
NMDAR-fEPSP amplitudes in the presence of MK801. *P < 0.05, one-way 
analysis of variance (ANOVA) with Tukey’s post-hoc test. Data represent 
mean + s.e.m. Number of experiments shown in Extended Data Table 2. 
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Figure 4 | Viral expression of Syt7 restores facilitation at Schaffer 
collateral synapses. a—d, Top, fluorescence images of yellow fluorescent 
protein (YFP)-tagged ChR2 and Syt7 immunostaining in the CA1 region 
after AAV injection into CA3 to express the indicated proteins in wild- 
type animals (a) or Syt7-KO animals (b-d). PY, stratum pyramidale; 

SR, stratum radiatum. Scale bar, 100 1m. Bottom, EPSCs and PPRs for 
responses evoked electrically (open symbols) and optically (blue symbols). 
Ina and b, only ChR2-YFP was expressed; in c, both ChR2-YFP and wild- 
type Syt7 were expressed (separated by a porcine teschovirus-1 2A (P2A) 
cleavage peptide); and in d, ChR2-YFP and Ca?*-insensitive Syt7(C2A*) 
were expressed. e, f, Summary of PPRs for 50-ms interstimulus interval. 
Asterisks denote significant difference from responses evoked electrically 
in uninjected wild-type animals (e), or optically in wild-type animals 
expressing ChR2 alone (f). *P < 0.05, one-way ANOVA with Tukey’s post- 
hoc test. Data represent mean + s.e.m. Number of experiments is shown on 
bar graphs. 


To determine whether Ca”* binding by Syt7 is important for facilita- 
tion, we assessed whether facilitation is rescued by Syt7 with a mutated 
Ca’ -insensitive C2A domain (Syt7(C2A*)). Previous studies estab- 
lished that Ca”* binding to the C2A domain of Syt7 is required for 
Syt7 to mediate asynchronous release!®. We found that Syt7(C2A*) 
did not rescue facilitation in knockouts (Fig. 4d-f). Moreover, in wild- 
type animals, Syt7(C2A*) expression strongly attenuated facilitation 
(Fig. 4e, f and Extended Data Fig. 7b), suggesting that Syt7(C2A*) 
competes with native Syt7 to suppress facilitation. 

Our results indicate that facilitation requires Ca** binding to the 
C2A domain of Syt7, and also provide insight into the role of Syt7 in 
facilitation. We conclude that Syt7 does not produce facilitation by 
altering the amplitude and time course of Cayes (Fig. 2), by increasing 
initial p (Fig. 3), by acting as a Ca’* buffer (Extended Data Fig. 8), 
or through use-dependent increases in the total Caingux (Extended 
Data Fig. 1b and Fig. 2). The observation that initial p is unaltered in 
Syt7 knockouts indicates that local Caingux is unaffected for the first 
stimulus, but it is difficult to rule out the possibility that Syt7 mediates 
a use-dependent increase in Cainfux through the subset of channels 
that trigger vesicle fusion. There is, however, no evidence for Syt7 
associating with or regulating calcium channels. By contrast, Syt7 is 
known to interact with Syt1 and can mediate vesicle fusion'*'*. The 
most parsimonious explanation is that Syt7 acts as the proposed spe- 
cialized Ca?" sensor to increase p during facilitation. Facilitated release 
exhibits rapid kinetics, suggesting that Syt7 somehow increases the 
probability of Sytl-dependent vesicle fusion. Whether this is through 
a direct interaction of Syt7 with a fast synaptotagmin isoform such as 
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Syt1 remains an open question. It is also unclear whether the recently 
described interaction between Syt7 and calmodulin that promotes 
vesicle replenishment’? is similarly required for facilitation. Finally, 
it is possible that at other synapses facilitation is mediated by addi- 
tional specialized Ca?* sensors, or involves other mechanisms. Further 
studies are needed to clarify these issues. 

Based primarily on theoretical considerations, facilitation is thought 
to influence both information transfer and network dynamics pro- 
foundly. In the hippocampus, the high-pass filtering imposed by 
facilitating synapses may account for the burst firing in place cells 
that encode spatial information”®. In the auditory pathway, facilitation 
is proposed to counteract short-term depression to maintain linear 
transmission of rate-coded sound intensity”. It has even been sug- 
gested that facilitation forms the basis of short-term memory, as facil- 
itating recurrent connections within cortical networks could support 
the persistent activity states associated with working memory”. In 
future studies, the selective elimination of Syt7 from specific cell types 
could allow the first direct tests of the effect of facilitation on neural 
circuits and behaviour. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Animals and viruses. All mice were handled in accordance with NIH guide- 
lines and protocols approved by Harvard Medical School. Syt7 knockout mice?! 
(Jackson Laboratory) and wild-type littermates of either sex were used. Statistical 
tests were not used to predetermine sample size. Blinding and randomization were 
not performed. AAV2/9-hSyn-hChR2(H134R)-EYFP and its pAAV backbone 
(Addgene 26973) were obtained from the University of Pennsylvania Vector Core. 
Complementary DNA encoding the rat Syt7 wild-type o isoform and C2A* mutant 
(D225A, D227A and D233A)!* were provided by T. Bacaj and T. Sudhof. For res- 
cue experiments involving Syt7 with mutated Ca** binding domains, we used the 
mutated C2A* version instead of the C2A*C2B* double mutant, as mutation of 
both C2 domains leads to lower levels of expression. The P2A cleavage sequence** 
and Syt7 were inserted after the ChR2 carboxy terminus in the pAAV backbone 
(Genscript). Plasmid-driven expression of ChR2-YFP and Syt7 was confirmed in 
HEK cells by Syt7 immunostaining and patch-clamp recording of ChR2 photocur- 
rents. AAVs were produced and purified from HEK cells as previously described*’. 
Stereotaxic surgeries were performed as described**. Postnatal day (P) 
18-30 mice were anaesthetized with ketamine/xylazine/acepromazine 
(100/10/3 mgkg~') supplemented with 1-4% isoflurane. Viruses were injected 
through glass capillary needles using a syringe (Hamilton) mounted on a stereo- 
taxic instrument (Kopf). Injection coordinates from lambda were 2.69 mm (ros- 
tral), 3mm (lateral) and 2.8 mm (ventral). One microlitre of virus suspension was 
delivered at a rate of 0.1 plmin~! using a microsyringe pump (WPI; UMP3) and 
microsyringe pump controller (WPI; Micro4). The needle was slowly retracted 
5-10 min after injection, and the scalp incision was closed with gluture. Post- 
injection analgesic (buprenophrine, 0.05 mgkg ') was administered subcuta- 
neously for 48h. 
Acute slice preparation. P30-P60 animals were euthanized under isoflurane 
anaesthesia, 14-30 days after AAV injection. Brains were removed and placed 
in ice-cold solution containing (in mM): 234 sucrose, 25 NaHCO;, 11 glucose, 
7 MgClo, 2.5 KCI, 1.25 NaH2PO, and 0.5 CaCl. Then, 270-\1m-thick trans- 
verse slices (hippocampal recordings) or 250-\1m-thick sagittal slices (thalamic 
recordings) were prepared on a vibrotome (Leica, VT1000s), and a cut was 
made between CA3 and CAI to prevent recurrent excitation. Slices were trans- 
ferred for 30 min to 32°C artificial cerebrospinal solution (ACSF) containing 
(in mM): 125 NaCl, 26 NaHCOs, 25 glucose, 2.5 KCI, 2 CaCly, 1.25 NaH2PO4 
and 1 MgCh, adjusted to 315 mOsm, and allowed to equilibrate to room tem- 
perature for >30 min. Experiments were performed at 33 + 1°C with flow rates 
of 2mlmin7!. 
Electrophysiology. For ChR2 stimulation, 160 mW mm * laser pulses (0.2- 
0.5 ms) from a 100-mW 473 nm laser (OptoEngine, MBL-III) were focused 
through the x60 objective of the microscope (Olympus, BX51WI) to produce 
a 80-j1m diameter spot over the stratum radiatum, >500 1m from the recorded 
cell to avoid activating ChR2 in presynaptic boutons, which can artificially raise 
the probability of release and obscure facilitation*4. Extracellular stimulation was 
performed with a stimulus isolation unit (WPI, A360) using glass monopolar 
electrodes (0.5-1 MQ) filled with ACSF. Stimulus electrodes were positioned 
~500\1m from the recording electrode in the stratum radiatum (Schaffer collat- 
erals), the internal capsule (corticothalamic), the hillus adjacent to the dentate 
granule cell layer (mossy fibres), and the outer molecular layer (lateral per- 
formant path). To ensure that mossy fibre responses were not contaminated by 
associational/commissural inputs the metabotropic glutamate receptor agonist 
DCG-IV (1|1M) was applied at the end of experiments to block mossy fibre 
responses selectively*°. Data were included only if responses were reduced by 
more than 80% (average reduction was 88 + 1% in wild-type and 90 + 2% in 
Syt7-knockout mice), and the amplitude of mossy fibre responses was measured 
after subtracting the response remaining in the presence of DCG-IV. Stimulus 
trials were repeated at 0.1 Hz (0.033 Hz at mossy fibres to avoid potentiation), and 
artefacts were deleted for display. Recordings were acquired using an amplifier 
(Axon Instruments, Multiclamp 700B) controlled by custom software written 
in IgorPro (provided by Matthew Xu-Friedman, SUNY Buffalo), and low-pass 
filtered at 2kHz. Whole-cell recordings were obtained using borosilicate patch 
pipettes (2-5 MQ) pulled with a horizontal puller (Sutter P-97). The internal 
recording solution contained (in mM): 150 Cs-gluconate, 3 KCI, 10 HEPES, 
0.5 EGTA, 3 MgATP, 0.5 NaGTP, 5 phosphocreatine-Tris and 5 phosphocre- 
atine-Na; pH 7.2. Cells were held at —70 mV, and series resistance was mon- 
itored during recordings. fEPSPs were recorded in current-clamp mode with 
ACSF-filled patch pipettes (0.5-1 MQ). Inhibition was blocked with picrotoxin 
(501M), and during fEPSP recordings, CPP (2\1M) and CGP (31M) was added 
to the bath. Approximately 4-10 trials were conducted for each stimulus fre- 
quency, and recordings were averaged over trials. Data in all figures represent 
the mean + s.e.m. Average responses are displayed with double exponential or 


polynomial curves fit in IgorPro. Unless stated otherwise, statistical significance 
was assessed by unpaired two-tailed Student's t-test, or one-way ANOVA fol- 
lowed by Tukey’s post-hoc test. 

Probability of release. To record NUDAR-EPSCs, cells were voltage clamped at 
+40 mV, and the internal solution contained (in mM): 85 Cs-methanesulfonate, 
4 NaCl, 10 HEPES, 0.2 EGTA, 30 BAPTA, 2 MgATP, 0.4 NaGTP, 10 phospho- 
creatine-Na, 25 TEA, 5 QX-314; pH 7.3. For recording NUDAR-fEPSPs, Mg? 
was excluded from ACSF to relieve Mg”* block of NMDA receptors. Picrotoxin 
(100 1M) and NBQX (541M) were added to the bath, and stimulation was con- 
ducted at 0.1 Hz (unless otherwise indicated) for 5 min to obtain a baseline 
response. Stimulation was halted for 10 min while (+-)-MK801 (401M) was added 
and allowed to equilibrate. For experiments involving fEPSPs versus presynaptic 
volley, the postsynaptic response was measured by the slope of the fEPSP, while 
the amplitude of the presynaptic volley was used to determine the number of 
activated fibres. If p increases, the same number of activated presynaptic fibres will 
produce a larger fEPSP. The ratio between fEPSP and volley was determined by 
line fits to the linear regime of the input-output curve of individual experiments 
(20-80 1A stimuli). 

The study of probability of release is complicated because many people 

use p to refer to the probability of release of a vesicle (py) and others refer 
to probability of release from an active zone (Psynapse) that contains N vesi- 
cles in its readily releasable pool. Thus, an increase in the size of the readily 
releasable pool for an active zone can increase Psynapse even if py is unaltered. 
Although MK801 blockade” and fEPSPs versus presynaptic volley~* are both 
widely used methods to detect changes in the probability of release, for both 
approaches it is conceivable (although unlikely) that increases in py could be 
obscured by a perfectly balanced decrease in the readily releasable pool size. 
However, the relationship between EPSC amplitude and extracellular Ca?* 
is similar in wild-type and Syt7-knockout animals. This suggests there is no 
increase in py, which would cause this curve to saturate at lower values of Ca, for 
Syt7-knockout animals. Moreover, the large differences in facilitation in wild- 
type and Syt7-knockout animals were even more pronounced when the prob- 
ability of release was reduced tenfold by lowering Ca, from 2 mM to 0.5mM, 
which is incompatible with an increase in py obscuring facilitation by depleting 
vesicles. 
Ca”* measurements. Ca”+ was measured as described previously’. In brief, 
CA3 fibres were labelled for 3 min using an ACSF-filled pipette containing either 
magnesium green AM or fura-2 AM (2401M) and 1% fast green, placed into 
the border of the CA3-CA1 field. A vacuum pipette placed above the load- 
ing site removed excess indicator. Slices were incubated for at least 1 h and 
imaging was performed in stratum radiatum of CA1 at least 500 j1m from the 
injection site using a 60x objective and custom-built photodiode. Excitation 
was achieved using a tungsten (magnesium green) or xenon (fura-2) lamp. 
Schaffer collaterals were stimulated using a glass electrode placed at least 
300 1m from the imaging site. To prevent recurrent excitation, experiments 
were performed in the presence of NBQX (10}1M), CPP (241M) and picrotoxin 
(501M). 

Magnesium green is a low-affinity calcium indicator*® (Kp = 7 1M) that 
provides an approximately linear measure of Cares (ref. 37). As such it is well 
suited to measuring the time course of presynaptic Cayes (Fig. 2a) and detecting 
changes in Cainftux during successive stimulations (Fig. 2b). However, with the 
bulk loading approach the size of the fluorescence change is proportional to the 
number of stimulated fibres, so the absolute Ca;es signal is not readily quantified 
with magnesium green. By contrast, fura-2 has a high affinity for calcium**”? 
(Kp = 131 nM) so it provides a saturating sublinear response to increases in Cares 
(refs 40-42). This can be used to test for changes in the absolute size of Caingux 
because a change in the Cainfux per stimulus would change the ratio between 
the fluorescence change produced by the first and second stimuli. 
Immunohistochemistry. Two to four weeks after AAV injection, mice were 
anaesthetized with ketamine and transcardially perfused with 4% paraformalde- 
hyde (PFA) in PBS. The brain was removed and post-fixed for 24h. Slices (501m 
thick) were permeabilized (PBS plus 0.4% Triton X-100) for 30 min and then 
prepared in blocking solution (PBS plus 0.2% Triton X-100 and 2% normal goat 
serum; PBST) for 30 min at room temperature. Slices were incubated overnight 
at 4°C in PBST with primary antibodies (anti-Syt7 (Synaptic Systems, 105173), 
1g ml}; 1:200, targeting amino acids 46-133 of Syt7«, anti-vGlutl (Synaptic 
Systems, 135304), lug ml~!; 1:500, and anti-calbindin-D28k (Sigma Aldrich, 
C9848), 1g ml7}; 1:500), followed by incubation with secondary antibodies 
in PBST for 2h at room temperature. For both wild-type and Syt7-knockout 
mice, images from each brain region were acquired on a laser scanning confocal 
(Olympus, FluoView1200) using the same laser/microscope settings and pro- 
cessed in ImageJ identically. 
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Extended Data Figure 1 | Possible mechanisms for synaptic facilitation. 
a-d, It is established that calcium has an important role in synaptic 
facilitation, and several mechanisms have been proposed that involve 
different aspects of calcium signalling”. Here we discuss the calcium 
signals that evoke rapid vesicle fusion, and also those thought to be 
involved in facilitation (a), and three mechanisms of facilitation are 
presented schematically** (b-d). a, To understand the mechanisms that 
have been proposed to account for facilitation, it is important to appreciate 
different aspects of presynaptic calcium signalling. Calcium signals 

are complex, but can be approximated by two components. An action 
potential opens calcium channels for less than a millisecond, and near 
open channels the calcium levels reach tens of micromolar. Release sites 
near calcium channels experience high local calcium levels (Cajo.qi) that 
are highly dependent on the distance from open calcium channels. Cajocal 
can be reduced by high concentrations of fast calcium buffers that rapidly 
bind calcium. In addition, there is a residual calcium signal (Cares) that 
results from calcium equilibrating within presynaptic terminals, before 
calcium is gradually removed over tens to hundreds of milliseconds. The 
amplitude of Cayes (and also total influx of Ca**, Cainfux) is determined 
by all of the calcium channels that open, not only those that produce 
Cajocal that drives release, and after initial equilibration Cares is roughly 
uniform throughout the presynaptic bouton. It is generally accepted that 
fast synaptic transmission is produced by calcium binding to Syt1, Syt2 

or Syt9, which have low-affinity binding sites, fast kinetics, and require 
the binding of multiple calcium ions”**. The time course of release 
follows the time course of calcium channel opening, but with a brief delay 
(<1 ms). Cayes after a single stimulus is much smaller than Cajoca). Typical 
fluorescence-based approaches to measure calcium readily detect Cares, 
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but are insensitive to Cajo¢a, which is too localized and short-lived to 
measure. Note the y axis is logarithmic to show both Cajocqi and Cayes in a, 
but not in b-d. b, For one mechanism of facilitation, a fast calcium buffer 
is present in presynaptic terminals that binds calcium and reduces Cajocal. 
Stimulation twice in rapid succession results in the same calcium influx for 
both stimuli. If there is no fast presynaptic buffer, the amplitudes of Cajocai 
and the EPSCs are the same for both stimuli (red traces). Ifa fast high- 
affinity buffer is present (black traces), it reduces the initial Caj,,.) and 
reduces the amplitude of the initial EPSC, but if enough calcium enters 
and binds to the buffer, it reduces its ability to buffer calcium. As a result, 
the second stimulus produces larger Cajocaj than the first, and the EPSC is 
facilitated. c, A second possible mechanism is that more calcium enters for 
the second stimulus, and as a result there is more neurotransmitter release. 
This could arise from a spike broadening, or from the modulation of 
calcium channels. It is possible that influx through all calcium channels in 
the presynaptic terminal would be increased, in which case both Cay; and 
Cajocai Would be increased. It is also possible that the only calcium channels 
that are modulated are the subset that produce Cajo-qi that triggers release, 
in which case Cares would not be significantly increased. d, Finally, it is 
possible that there is a specialized calcium sensor that produces facilitation 
that is distinct from Sytl (refs 2, 4, 45). Previous studies have shown that 
such a sensor would need to be sensitive to Ca;e, based on the observation 
that facilitation is altered at some synapses by manipulations that affect 
Cazes Without affecting Cajoca). According to this scheme, release is 
mediated by Syt1 but calcium binding to a second sensor would increase p. 
The sensor is sufficiently slow that it does not influence release evoked 

by the first stimulus, but it is able to influence release evoked by a second 
stimulus. 
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Extended Data Figure 2 | Immunohistochemistry of Syt7 expression at 
four different synapses. a—d, Fluorescent images of immunostaining for 
vGlutl (top) and syt7 (bottom) in slices from wild-type and Syt7-knockout 
animals, showing the stratum radiatum (SR) of hippocampal CA1 region 
(a), the ventral thalamus (b), mossy fibres (MF) in hippocampal CA3 (c), 
and the lateral and medial performant paths (LPP and MPP) in the outer 
molecular layer of the dentate gyrus (d). Notably, Syt7 expression in wild- 
type animals was higher in the LPP, where synapses exhibit facilitation, 
than in the MPP, where synapses exhibit depression. Scale bar, 50 1m. 
The presence of Syt7 labelling in regions containing CA3-CA1 

synapses, layer 6 to thalamus synapses, mossy fibres synapses and 
LPP-granule-cell synapses that are also colabelled with antibodies to 

the presynaptic marker for glutamatergic synapses vGlut1, suggests that 
Syt7 is located presynaptically at these synapses. It is, however, difficult 
to obtain sufficient resolution with confocal microscopy in brain slices 

to unambiguously establish that Syt7 is located presynaptically at these 
synapses. Importantly, the Allen Brain atlas (http://www.brain-map.org) 
suggests that the presynaptic cells for these synapses contain messenger 
RNA for Syt7. Lastly, immunoelectron microscopy revealed selective 
staining of presynaptic boutons in the CA1 region of the hippocampus’®. 
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Extended Data Figure 3 | Immunohistochemistry of Syt7 and 

calbindin expression at mossy fibre synapses. Fluorescent images of 
immunostaining for calbindin-D28k, which predominantly labels mossy 
fibres in the CA3 region of the hippocampus*“* (top) and Syt7 (bottom) in 
slices from wild-type and Syt7-knockout animals. Colocalization of Syt7 
and calbindin staining in wild-type animals provides further support for 
the expression of Syt7 in mossy fibre terminals. Scale bar, 20 1m. 
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Extended Data Figure 4 | Loss of facilitation in Syt7-knockout animals 
at multiple frequencies. Average normalized synaptic responses evoked 
by extracellular stimulation with trains at frequencies from 5 to 50 Hz 

at four synapses in slices from wild-type and Syt7-knockout animals. 
Enhancement during trains was eliminated for all synapses other than 
mossy fibre synapses, where significant enhancement was present by the 
fifth stimulus for 5 Hz and 10 Hz, the third stimulus for 20 Hz, and the 


sixth stimulus for 50 Hz (compared to 1 by a Wilcoxon signed rank test, 
P<0.05). This indicates that another form of synaptic enhancement 
gradually builds during repetitive activation and is consistent with a 
specialized form of synaptic enhancement that has been described at 
mossy fibre synapses in which spike broadening gradually builds during 
repetitive activation and leads to increased calcium influx. The numbers of 
experiments are shown in Extended Data Table 1. 
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Extended Data Figure 5 | Spontaneous release is similar in wild-type b, Representative sEPSCs, averaged from >50 events recorded in 
and Syt7-knockout animals. a, Representative spontaneous EPSCs wild-type and knockout animals. Vertical scale bars, 10 pA. c, d, Average 
(sEPSCs) recorded from voltage-clamped hippocampal CA] cells in sEPSC amplitude (c) and frequency (d) in wild-type (n = 16) and 
wild-type (black) and knockout (red) animals. Vertical scale bars, 20 pA. Syt7-knockout animals (n= 18). 
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Extended Data Figure 6 | MK801 blockade of NUDAR-mediated EPSCs of fifteenth to twentieth stimuli). Vertical scale bars, 100 pA. b, Average 
reveals similar initial release probability in wild-type and knockout NMDAR-EPSCs recorded in the presence of MK801, normalized to the first 
synapses. a, Representative NMDAR-EPSCs recorded in wild-type stimulus. c, Half-decay times of NUDAR-EPSC amplitudes. *P < 0.05, 
and knockout animals before the application of MK801 (average of 10 one-way ANOVA with Tukey’s post-hoc test. Data represent mean + s.e.m. 
traces) and after stimulation in the presence of MK801 (average response The number of experiments is shown in Extended Data Table 2. 
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Extended Data Figure 7 | Effect of virally expressed Syt7 wild-type 

and Syt7(C2A*) in wild-type animals. a, b, Top, AAV was injected into 
the hippocampal CA3 region in wild-type animals to express ChR2 and 
either wild-type Syt7 (a) or Syt7(C2A*). Bottom, representative EPSCs 
and average paired-pulse ratios for responses evoked electrically and 
optically in wild-type slices with AAV-driven expression of wild-type Syt7 
(electrical, n = 12; optical, nm = 13) (a) and Syt7(C2A*) (electrical, n = 5; 
optical, n = 13) (b). Vertical scale bars, 100 pA. 
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Extended Data Figure 8 | Evidence suggests that Syt7 does not produce 
facilitation by acting as a local calcium buffer at the CA3-CA1 
synapse. This graph illustrates the general relationship between PPR 

and external calcium for synapses in which buffer saturation produces 
facilitation (green) and for facilitation observed at the CA3-CA1 

synapse and many other synapses (black)’. It has been shown previously 
that the for buffer saturation mechanism (Extended Data Fig. 1b) the 
amplitude of facilitation is reduced when Cainfux is reduced by lowering 
external calcium’. This can be understood by considering that this form 
of facilitation is thought to require sufficient Caingux to saturate the 
endogenous buffer, and thereby reduce its ability to buffer calcium for 
subsequent stimuli. If Cainqux is low, then there is insufficient calcium 
entry to bind very much of the endogenous buffer, and little facilitation 
would result. In addition, as shown in Extended Data Fig. 1, for a calcium 
buffer to produce facilitation it would need to buffer calcium sufficiently 
that it would reduce initial p. We have shown, however, that p is unaltered 
in Syt7 knockouts. This is perhaps not surprising in light of the fact that 
Syt7 is thought to be located on the plasma membrane, and in cases 
where this type of facilitation has been observed it is associated with high 
concentrations of a fast cytosolic buffer’. 
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Extended Data Table 1 | Number of electrophysiological recordings from wild-type and Syt7-knockout animals 


Figure Synapse Experiment Genotype # of Recordings # of Animals 
Figure 1a Schaffer collateral Paired-pulse WT 13 6 
KO 17 5 
Figure 1b Corticothalamic Paired-pulse WT 23 8 
KO 23 5 
Figure 1c Hippocampal mossy fibre Paired-pulse WT 10 6 
KO 8 4 
Figure 1d Lateral perforant path Paired-pulse WT 6 3 
KO 13 3 
Figure 1e, Extended Schaffer collateral Trains (2-50 Hz) WT 14 6 
Data Figure 4 Trains (2-50 Hz) KO 17 6 
Figure 1f, Extended Corticothalamic Trains (2-50 Hz) WT 16 8 
Data Figure 4 Train (2 Hz) KO 5 2 
Trains (5-50 Hz) KO 12 3 
Figure 1g, Extended Hippocampal mossy fibre Train (2 Hz) WT 3 2 
Data Figure 4 Train (5 Hz) WT ri 4 
Train (10-50 Hz) WT 10 5 
Train (2-5 Hz) KO 3 2 
Train (10-50 Hz) KO 8 4 
Figure 1h, Extended Lateral perforant path Train (2-10 Hz) WT 3 2 
Data Figure 4 Train (20-50 Hz) WT 6 3 
Train (2-5 Hz) KO 5 Z 
Train (10 Hz) KO 10 3 
Train (20-50 Hz) KO 13 3 
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Extended Data Table 2 | Number of experiments related to the Ca2+-dependence of probability of release 
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Figure Experiment Condition Genotype # of Recordings # of Animals 
Figure 2a,b Presynaptic Magnesium Green WT 11 2 
Ca** imaging KO 10 2 
Figure 2c Presynaptic Fura-2 WT 14 3 
Ca** imaging KO 10 2 
Figure 2d-f Ca**-dependence of 0.5 mM Ca WT 12 5 
CA3-CA1 EPSC 1mMCa WT 9 4 
2mM Ca * WT 15 6 
3mM Ca WT 6 2 
0.5 mM Ca KO 8 4 
1mMCa KO 7 6 
2mMCa * KO 10 8 
3mM Ca KO 4 2 
Figure 3a,b fEPSP vs. fibre volley 20-100 UA stimulation WT 44 11 
; KO 25 8 
Figure 3c,d Ca**-dependence of 0.5 mM Ca WT 4 2 
CA3-CA1 fEPSP 1mMCa WT 11 5. 
2mMCa * WT 11 5 
3mM Ca WT 8 3 
0.5 mM Ca KO 4 4 
1mMCa KO 8 4 
2mMCa * KO 9 6 
3mMCa KO 6 3 
Figure 3e-g MK801 blockade 2 mM Ca, single stim WT 6 2 
of NMDAR-fEPSP 2mM Ca, triple stim WT 5 3 
2 mM Ca, single stim KO 6 3 
2mM Ca, triple stim KO 4 3 
Extended Data MK801 blockade 1mMCa WT 14 3 
Figure 6 of NMDAR-EPSC 2mMCa WT 11 4 
3mMCa WT 3 2 
2mMCa KO 9 4 


*To normalize responses in different Ca?* concentrations, all Ca2+-dependence experiments included recordings in 2mM Ca?* followed by wash in of different Ca2* concentrations. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


doi:10.1038/nature16483 


The C. elegans adult neuronal IIS/FOXO 
transcriptome reveals adult phenotype regulators 


Rachel Kaletsky!*, Vanisha Lakhina!*, Rachel Arey!, April Williams!, Jessica Landis', Jasmine Ashraf! & Coleen T. Murphy! 


Insulin/insulin-like growth factor signalling (IIS) is a critical 
regulator of an organism’s most important biological decisions 
from growth, development, and metabolism to reproduction and 
longevity. It primarily does so through the activity of the DAF-16 
transcription factor (forkhead box O (FOXO) homologue), 
whose global targets were identified in Caenorhabditis elegans 
using whole-worm transcriptional analyses more than a decade 
ago!. IIS and FOXO also regulate important neuronal and adult 
behavioural phenotypes, such as the maintenance of memory” and 
axon regeneration? with age, in both mammals‘ and C. elegans, 
but the neuron-specific IIS/FOXO targets that regulate these 
functions are still unknown. By isolating adult C. elegans neurons 
for transcriptional profiling, we identified both the wild-type 
and IIS/FOXO mutant adult neuronal transcriptomes for the 
first time. IIS/FOXO neuron-specific targets are distinct from 
canonical IIS/FOXO-regulated longevity and metabolism targets, 
and are required for extended memory in IIS daf-2 mutants. The 
activity of the forkhead transcription factor FKH-9 in neurons is 
required for the ability of daf-2 mutants to regenerate axons with 
age, and its activity in non-neuronal tissues is required for the long 
lifespan of daf-2 mutants. Together, neuron-specific and canonical 
IIS/FOXO-regulated targets enable the coordinated extension of 
neuronal activities, metabolism, and longevity under low-insulin 
signalling conditions. 

The C. elegans IIS pathway acts both cell autonomously and non- 
autonomously to control longevity, growth, dauer formation, metab- 
olism, and reproduction*”’ through its regulation of the nuclear 
localization and transcriptional activation of DAF-16 (also known as 
FOXO). The canonical IIS/FOXO gene set, which identified primar- 
ily intestinal and hypodermal targets (Extended Data Fig. 1a, b)'*”, 
has been instructive in our understanding of how insulin signalling 
regulates a diverse range of activities, including metabolism, auto- 
phagy, stress resistance, and proteostasis. However, IIS mutants also 
exhibit daf- 16-dependent neuronal phenotypes, including extended 
positive olfactory learning’, increased short- and long-term associative 
memory’, increased thermotaxis learning", improved neuronal mor- 
phology maintenance!!”, and improved axon regeneration®. These 
phenotypes are unlikely to be regulated by the known intestinal and 
hypodermal IIS/FOXO targets’*. Therefore, to understand how IIS 
daf-2 mutant animals extend behavioural functionality, we must iden- 
tify the neuronal targets of FOXO/DAF- 16. 

We first profiled the expression of daf-16;daf-2 mutant worms 
with daf-16 rescued in specific tissues® (Supplementary Table 1). 
Intestinal daf-16 rescue correlates best with whole-worm profiles 
(Extended Data Fig. la, c). By contrast, neuronal daf-16 rescue pro- 
files are anti-correlated with the intestinal DAF-16 and whole-worm 
profiles (Extended Data Fig. 1a, c). Surprisingly, many genes induced 
by neuronal DAF-16 rescue are expressed (WormBase) or predicted 
to be expressed in non-neuronal tissues!> (Extended Data Fig. 1d), 
and have non-neuronal functions (for example, collagens‘; Extended 


Data Fig. 1b, e, Supplementary Table 2). Thus, whole-worm transcrip- 
tional analyses of neuronally rescued DAF-16 failed to reveal targets 
that account for daf-16-dependent age-related behaviours of daf-2 
mutants. Therefore, we needed to specifically examine transcription 
in IJS-mutant neurons. 

The tough outer cuticle prevents dissociation of adult tissues!>, thus 
the wild-type adult neuronal transcriptome has not been described. 
To solve this problem, we used rapid, chilled chemomechanical dis- 
ruption followed immediately by fluorescence-activated cell sorting 
(FACS) to isolate neurons marked with green fluorescent protein 
(GFP) from wild-type worms, then RNA-sequenced these isolated cells 
(Fig. la—c, Extended Data Fig. 2a-c, f, g, Supplementary Table 3). This 
method is gentle enough to preserve the integrity of cells and some 
neurites (Extended Data Fig. 2a), does not involve cell culturing before 
FACS, in contrast to previous methods'®, and does not affect transcrip- 
tion (as shown by actinomycin D treatment; Fig. 1b, Extended Data 
Fig. 2d, e, Supplementary Table 4). Downsampling analysis showed 
that sufficient sequencing depth was achieved (Extended Data Fig. 2h). 

We compared gene expression in isolated wild-type neurons with 
whole-worm expression to identify genes that are enriched in neu- 
rons (Fig. la—c). Of the 1,507 ‘neuron-enriched’ genes (false discov- 
ery rate (FDR) <0.1; Supplementary Table 3; Fig. la, b), only 4% 
have previously described expression patterns exclusively in non- 
neuronal tissues, and “Neuron is the only significantly enriched tis- 
sue (Fig. 1c, Extended Data Fig. 2f), indicating that the method is 
highly selective for neuronal transcripts. Gene promoter-GFP tests of 
previously uncharacterized genes from our neuron-enriched list con- 
firmed neuronal expression, with no bias for particular neuron types 
(Extended Data Fig. 3a). We also detected genes previously reported 
to be expressed only in single neurons or small subsets of neurons, 
including glr-3 (in the RIA neuron), ttx-3 (interneuron ATY/AIA) and 
npr-14 (neuron AIY) (WormBase). 

The wild-type neuron-enriched set includes synaptic machin- 
ery, ion channels, neurotransmitters, and signalling components 
(Supplementary Table 3), as well as >700 previously uncharacterized 
genes; these genes are predicted to have ‘neuronal’-like character and 
function (Fig. 1d). Comparison of the wild-type embryonic and larval 
neuronal transcriptomes with the adult neuronal transcriptome at the 
same FDR revealed a shift in functional categories from developmental 
processes to neuronal function/behaviour in the adult neuronal tran- 
scriptome (Fig. le, Extended Data Fig. 3b, c, Supplementary Table 5), 
suggesting that previous isolation methods’®, either due to early devel- 
opmental stage isolation or to re-culturing, biased expression towards 
developmental genes rather than neuronal/behavioural genes. 

To identify adult neuronal IIS/FOXO targets, we sequenced RNA 
from isolated daf-2 and daf-16;daf-2 mutant neurons on day 1 of adult- 
hood (Fig. 2a, Extended Data Fig. 4, Supplementary Table 6, 8). The 
IIS/FOXO neuron-isolated gene set is enriched for neuronal expres- 
sion: 86% and 92% of the up- and downregulated genes, respectively, 
are expressed in wild-type neurons. While several top Class I gene 
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Figure 1 | Identification of neuronal IIS/FOXO targets requires neuronal 
isolation. a, Volcano plot of neuron-expressed relative to whole-worm- 
expressed genes obtained by neuron-specific RNA sequencing of adult 
wild-type animals. N=3 biological replicates (wild-type neurons) and 

2 biological replicates (whole worm). b, Neuron-expressed and -enriched 
genes are not influenced by cell isolation: treatment with the transcription 
inhibitor actinomycin D affected only 0.22% of all neuronal genes 


targets of DAF- 16, including hil-1, sip-1, mtl-1, nnt-1, ins-6, and daf-16 
itself, were upregulated in both daf-2 mutant neurons and daf-2 mutant 
whole worms (Group B; Fig. 2b), most of the IIS/FOXO neuronally 
regulated set differs from the canonical whole-worm IIS/FOXOs set!® 
(Fig. 2b). Specifically, in contrast to the metabolism-dominated func- 
tions of canonical whole-worm IIS/FOXO targets'’, the neuronal 
IIS set gene ontology terms reflect neuron-like functions (Extended 
Data Fig. 5b): serpentine receptors, G protein-coupled receptors, syn- 
taxin, globins, kinesins, insulins, ion channels, potassium channels, 
seven-transmembrane receptors, the NPR-1 neuropeptide receptor, 
and the SER-3 octopamine receptor are upregulated in daf-2 neurons 
(Supplementary Table 6). A few genes (fat-3 and crh-1,a CREB homo- 
logue) are upregulated in daf-2 neurons but downregulated in whole 
daf-2 animals. 

The IIS/FOXO downregulated set includes serpentine receptors, 
guanylate cyclases, signalling peptides and receptors (neuropeptide- 
like proteins, FMRF-like peptides and neuropeptides), and the 
vesicle trafficking G protein rab-28 (Supplementary Table 6). 
Expression of the sensory neuron cilia protein IFTA-2, which co-localizes 
with DAF-2 and whose loss increases lifespan'’, is downregulated in 
daf-2 mutants, consistent with the longevity of daf-2 and ciliated sen- 
sory neuron mutants’®. Similarly, sams-1 (S-adenosyl methionine syn- 
thetase), which is downregulated under long-lived dietary restriction 
conditions!’, and sma-5 and dbl-1, components of TGF-beta pathways 
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(Supplementary Table 4). c, Tissue expression prediction of wild-type 

adult neuron-enriched genes. Mean +s.e.m. d, GO terms highlight the 
neuronal characteristics of both all and previously uncharacterized neuron- 
enriched genes. e, Embryonic’®, larval’® and adult neuron-enriched genes 
and significant GO terms transition from developmental to neuronal and 
behavioural functions (Supplementary Table 5); FDR <10% for all gene sets. 


linked with IIS””°, are downregulated, perhaps coordinating the 
longevity and reproductive output of these pathways. 

Unlike canonical IIS/FOXO targets!, neuronal IIS/FOXO gene 
promoters are not enriched for the DBE (DAF-16 binding ele- 
ment, GTAAAt/cA), but the overlapping, upregulated (Group B) 
targets’ promoters contain twice as many DBEs (Extended Data 
Fig. 5a). The overlapping downregulated (Group F) targets are 
enriched for the PQM-1/DAE motif (CTTATCA, see refs 1, 8; 
Supplementary Table 7). DAF-16 may regulate neuronal activities 
indirectly through activation of ~60 IIS/FOXO-upregulated transcrip- 
tion factors (Supplementary Table 6). 

We next tested the roles of top-scoring genes in daf-2-regulated 
neuronal phenotypes. Long-term and short-term associative memory 
are both extended in daf-2 mutants in a daf-16-dependent manner” 
(Extended Data Fig. 6). The bZIP transcription factor CREB, which 
is required for long-term memory in many organisms, including 
C. elegans’, is upregulated by IIS/FOXO in neurons (Supplementary 
Table 6), correlating with the increased long-term memory of daf-2 
mutants””!. However, short-term associative memory (STAM; Fig. 2c) 
is CREB-independent?’, and the genes that enable STAM extension in 
daf-2 mutants are unknown. While the DAF-16 non-neuronal target 
sod-3 had no effect on the extended STAM of daf-2 mutants (Fig. 2c, 
Extended Data Fig. 6b-d), knockdown of 8 of the 10 top-ranked, 
upregulated IIS/FOXO targets significantly decreased the STAM of 
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Figure 2 | RNA-seq transcriptional profile of isolated neurons IIS/FOXO target gene RNAi (d, e). d, Learning indices relative 


reveals IIS/FOXO neuronal transcriptome. a, Volcano plot of 
daf-2-regulated, daf-16-dependent up- (red) and downregulated 

(green) neuronal genes (P < 0.05, N= 4 biological replicates per 

strain). b, Comparison of whole-worm (Class 1)* vs neuronal-IIS/FOXO 
targets. P values: hypergeometric distributions. c-e, Short-term 
associative memory (STAM) assays. c, Schematic of STAM assay and 
chemotaxis profiles of daf-2 treated with sod-3 (c) or neuronal 
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to control RNAi at 3h post-training of daf-2 treated with 
adult-only (green) or whole-life (blue) neuronal IIS/FOXO target 
gene RNAi. Mean + s.e.m., *P< 0.05, **P< 0.01, ***P< 0.001, 
** EP < (0001, two-way repeated measures ANOVA, Bonferroni 
post hoc tests. At least 3 biological replicates were performed for 
all STAM assays. 


Figure 3 | FKH-9 is a direct target of DAF-16 and is expressed in 
mechanosensory neurons. a, b, daf-16 is required for enhanced day 5 
axon regeneration in daf-2 mutants, mean +s.e.m., *P < 0.05, Fisher’s 
exact test, N= 26 (wild-type), 36 (daf-2) and 16 (daf-16;daf-2), 

2 biological replicates. c, Known larval regeneration genes are significantly 
enriched in the day 1 adult mechanosensory transcriptome. 63 genes 

are both DAF-16 targets and expressed in mechanosensory neurons 

(FDR < 5%; 3 biological replicates). d, fkh-9 messenger RNA levels are 
higher in aged daf-2 compared to wild type in a daf-16-dependent manner. 
N=4 biological replicates, two-way ANOVA, Bonferroni post hoc tests. 

e, Chromatin immunoprecipitation of DAF-16-GFP worms with and 
without heat shock, which mobilizes DAF-16 into the nucleus. DAF-16 
binds to the sod-3 promoter but not its 3’ UTR, and to the fkh-9 promoter 
at multiple locations (Extended Data Fig. 8). Fold enrichment relative 

to wild-type (not expressing DAF-16-GFP) is shown (mean + s.e.m., 
two-tailed t-test, N= 3 biological replicates). f, Neuronal FKH-9-GFP 
(fkh-9p::fkh-9::gfp) expression in daf-2 compared to wild type. N= 25 
animals. Mean + s.e.m., two-tailed t-test. d—-f, *P < 0.05, **P< 0.01, 

#EEP < 0.001. 
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Figure 4 | FKH-9 is required for improved axon regeneration, 
short-term associative memory and lifespan in daf-2 mutants. 

a, fkh-9 knockdown reduces axon regeneration of day 5 daf-2 mutants, 

as does daf-16 knockdown. Mean + s.e.m., *P < 0.05, Fisher’s exact test, 
N= 34 (control), 33 (fkh-9) and 31 (daf-16), 4 biological replicates. 

b, c, Neuronally-expressed fkh-9 rescues day 5 axon regeneration in 
daf-2;fkh-9 mutants. Mean + s.e.m., *P < 0.05, Fisher’s exact test, N= 20 
(daf-2), 19 (daf-2; fkh-9) and 35 (daf-2; fkh-9; Punc-119::fkh-9B, 

2 biological replicates. d, fkh-9 is required for enhanced memory in adult- 
only RNAi-treated daf-2 mutant worms. e, Neuronally-expressed fkh-9 


daf-2(e1370) (Fig. 2d, e), both in whole-life and adult-only RNA inter- 
ference (RNAi) tests. (Neuronal RNAi is effective in learning, STAM, 
and LTAM tests”!.) The variety of genes (ion channels, transcription 
factors, G-proteins, vesicle fusion proteins) required for daf-2 mutants’ 
extended STAM suggests that decreased insulin signalling affects a 
broad network of memory extension genes. Several of these genes are 
also required for learning and memory in wild type (Extended Data 
Fig. 6g), suggesting that daf-2 mutants maintain neuronal function, 
rather than using an alternative short-term memory mechanism. 
daf-2 mutants also maintain motor neuron axon regeneration ability 
with age in a daf-16-dependent manner’, and we found this is also true 
for mechanosensory neurons (Fig. 3a, b, Extended Data Fig. 7a-d). 
To identify factors that enable axon regeneration with age, we iso- 
lated and RNA-sequenced six adult mechanosensory neurons (Fig. 3c, 
Supplementary Table 9); this set includes 94 known larval regenera- 
tion genes from limited candidate screens”” (P< 1.82 x 10-7°). To find 
daf-2/daf-16-dependent axon regeneration candidates, we identified 
mechanosensory genes that are also regulated by neuronal IIS/FOXO 
(Fig. 3c, Supplementary Table 9; P< 0.002). The forkhead transcrip- 
tion factor FKH-9 is a neuronal IIS/FOXO target (Supplementary 
Table 6) and a canonical Class I target!, and is expressed in 
mechanosensory neurons (Supplementary Table 9). The fkh-9 pro- 
moter is occupied by DAF-16, which we confirmed by chromatin 
immunoprecipitation followed by quantitative PCR (ChIP-qPCR; 
Fig. 3e, Extended Data Fig. 8a, b). FKH-9-GFP localized to nuclei, 


rescues extended STAM in daf-2;fkh-9 mutants with defective learning 
and memory. Mean +s.e.m., **P < 0.01, ***P < 0.001, ****P < 0.0001, 
two-way repeated measures ANOVA, Bonferroni post hoc tests. f, Adult- 
specific fkh-9 RNAi treatment reduces daf-2 mutant lifespan. Median 
lifespan: control RNAi 42 days, fkh-9 RNAi 21 days, daf-16 RNAi 21 days. 
P<0.0001 for control RNAi vs daf-16 RNAi and control vs fkh-9 RNAi, 
log-rank test. N= 144 worms per strain. g, Integrative Multi-species 
Prediction (IMP; see ref. 30) network analysis of DAF-16 neuronal target 
genes with STAM phenotypes (red circles). 


and neurons were the primary site of differential FKH-9-GFP levels 
in daf-2 mutants (Fig. 3f, Extended Data Fig. 8c), all suggesting a role 
for FKH-9 in daf-2/daf-16-mediated neuronal function. 

While there is no effect on the first day of adulthood (Extended Data 
Fig. 7e, f), loss of fkh-9 severely impairs axon regeneration ability in 
aged (day 5) daf-2 mutants (Fig. 4a), correlating with an increased 
difference in fkh-9 expression levels between wild-type and daf-2 
(Fig. 3d). Pan-neuronal fkh-9 expression rescues the ability of day 5 
daf-2;fkh-9 worms to regenerate PLM axons (Fig. 4b, c). fkh-9 levels 
are critical for neuron morphology, as fkh-9 neuronal overexpression 
causes axonal defects (Extended Data Fig. 7g). 

Adult-specific and whole-life reduction of fkh-9 also severely 
impaired extended STAM of daf-2 mutants (Fig. 4d, Extended Data 
Fig. 9). daf-2;fkh-9 double mutants were defective in both STAM 
and learning, and neuronal fkh-9 expression rescued these defects 
(Fig. 4e, Extended Data Fig. 9d, e), suggesting that fkh-9 is required 
for extended memory and normal neuronal development in daf-2 
mutants. Day 1 and 5 fkh-9 expression levels correlated with STAM 
and axon regeneration (Fig. 3d). fkh-9 reduction delayed devel- 
opment, and reduction during adulthood caused severe matricide 
(Extended Data Fig. 10a-c). fkh-9 knockdown in adult daf-2 worms 
treated with FUdR (5-fluoro-2’- deoxyuridine) to block matricide”® 
significantly shortened lifespan (40-50%; Fig. 4f). Pan-neuronal fkh-9 
expression did not rescue lifespan (Extended Data Fig. 10d), sug- 
gesting that FKH-9 acts in non-neuronal tissues to regulate lifespan. 
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Thus, IIS/FOXO-regulated FKH-9 function is important for both neu- 
ronal and non-neuronal growth and development, as well as adult 
memory and axon regeneration. Interestingly, the FKH-9 mammalian 
homologue FOXG1 is required for axon outgrowth”? and is the most 
highly-induced gene in spinal cords treated with radial glial cell trans- 
plant following spinal cord injury”. 

Network analysis using fkh-9 and the other 8 neuronal DAF-16 
STAM genes (Fig. 4g, Supplementary Table 10) identified casy-1, 
which is required for several forms of associative learning and 
memory””>’’, apl-1, the C. elegans orthologue of amyloid precursor 
protein (APP) that can disrupt sensory plasticity*’, and dlk-1, the only 
previously known regulator of age-dependent axon regeneration*””. 
Additionally, genes involved in neuronal degeneration (mec-17), neu- 
ronal development (egl-44, sem-4), neuronal function (egl-21, rcn-1, 
vab-9, cysl-1), synaptic regulation and function (cab-1, hlb-1, magu-4, 
sph-1, unc-64), and axon outgrowth (unc-14) and regeneration (egl-8, 
fos-1, pmk-3), were connected to the STAM genes. PQM-1 (ref. 8), 
whose motif (DAE) is overrepresented in neuronal IIS target promoters, 
and other IIS (akt-2, dct-6, hlh-30), TGF-6 (daf-14, sma-4, crm-1, 
sma-9, sma-1, sta-1), and MAPK pathway (vhp-1, pmk-3) components 
emerged in the network. Transcriptional regulation by IIS/FOXO 
and its targets may lead to broader, indirect transcriptional and non- 
transcriptional regulation of genes with important neuronal functions. 

Plasticity in development, reproduction and longevity allows organ- 
isms to respond appropriately to nutrient availability and changes in 
their environment. The IIS pathway is a critical mediator of these deci- 
sions, with FOXO selecting transcriptional targets to execute specific 
biochemical functions in each tissue, including factors that maintain 
cognitive function with age. daf-2 mutant worms maintain neuronal 
behaviours with age by using a set of transcriptional targets that are 
distinct from previously identified metabolic and stress resistance tar- 
gets expressed in other tissues. These genes may regulate additional 
neuronal targets through non-transcriptional mechanisms (Fig. 4g). 
The regulation of tissue-specific transcriptional programs is important 
to coordinate phenotypic responses, extending neuronal abilities in con- 
cert with the extended longevity and reproductive span of daf-2 mutants. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 
Adult cell isolation. Day 1 adult neuronally GFP-labelled worms (Punc119::GFP 
or Pmec-4::GFP) were prepared for cell isolation as previously described'* with 
modifications (Extended Data Fig. 2). Synchronized adult worms were washed 
with M9 buffer to remove excess bacteria. The pellet (~250 11) was washed with 
500 Ll lysis buffer (200 mM DTT, 0.25% SDS, 20 mM HEPES pH 8.0, 3% sucrose) 
and resuspended in 1,000 11 lysis buffer. Worms were incubated in lysis buffer with 
gentle rocking for 6.5 min at room temperature. The pellet was washed 6 x with 
M9 and resuspended in 20 mg ml! pronase from Streptomyces griseus (Sigma- 
Aldrich). Worms were incubated at room temperature (<20 min) with periodic 
mechanical disruption by pipetting every 2min. When most worm bodies were 
dissociated, leaving only small debris and eggs, ice-cold PBS buffer containing 
2% fetal bovine serum (Gibco) was added. RNA from FACS-sorted neurons was 
prepared for RNA-seq and subsequent analysis (see Extended Data for details). 
No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and the investigators were not blinded to allocation during 
experiments and outcome assessment. 
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Short-term associative memory assay. Memory assays were performed as 
described’. 

Axon regeneration assays. In vivo laser axotomy of PLM neurons was performed 
as described’. 
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and cell-non-autonomous targets are distinct. The number of genes 
that overlap between neuronal DAF-16-rescued whole-worm targets 
(Punc-119::daf-16::gfp;daf-16;daf-2 vs daf-16;daf-2) and isolated neuron 
IIS targets (daf-2 vs daf-16;daf-2) is shown (Supplementary Table 8). 
Hypergeometric distribution analysis (P values) shows that the extent 
of overlap between the gene categories is not significant. 
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Extended Data Figure 6 | Short-term associative memory phenotypes 
obtained upon knocking down neuronal IIS genes in daf-2 mutants 
and wild-type animals. daf-2 is required for various forms of C. elegans 
associative learning”’”*!**. daf-16 is required for the improvements 

and extensions of abilities with age of daf-2 mutants”. daf-2 mutants are 
defective for salt chemotaxis learning””*?**, and daf-16 is not involved in 
salt chemotaxis learning’”*!?. Furthermore, salt learning utilizes a unique 
daf-2c isoform’ in a daf-16-independent manner”’, suggesting a learning 
mechanism distinct from the associative memory paradigms studied here. 
We are specifically interested in understanding how activation of DAF-16 
results in the improved and extended abilities of daf-2 mutants to carry 
out olfactory associative learning’, short-term associative memory’, and 
long-term associative memory’, all of which require daf- 16. a, Chemotaxis 
index profile of wild type (N2) and daf-2 animals at time points following 
memory training. b, RNAi knockdown of sod-3, a non-neuronal DAF-16- 
regulated target that influences lifespan, has no effect on the extended 


short-term associative memory (STAM) of daf-2 mutants when treated 
with RNAi-feeding bacteria throughout the whole life (b) or only the 
post-developmental (adult-only) period (c, d) of the animal. daf-2 worms 
treated with daf- 16 RNAi have defective STAM, as previously reported’. 
e, Knockdown of the neuronal IIS candidate genes zip-5 and best-23 does 
not affect STAM. Time-courses showing the chemotaxis index for each time 
point are shown in d and e. Learning indices are shown in b, ¢, f and g. 

b-e, Two-way repeated measures ANOVA, Bonferroni post hoc tests. 

f, Treatment of daf-2 worms with neuronal DAF-16 target RNAi does 

not affect short-term associative learning. g, Neuronal-RNAi sensitive 
worms (Punc-119::sid-1) in a wild-type background were treated only 
during adulthood with RNAi targeted against the neuronal DAF-16 target 
genes. Learning (0h) and 1 h short-term associative memory time points 
are shown. a-g, Mean +s.e.m., *P < 0.05, **P< 0.01, ***P< 0.001, 
#EEEP < 0.0001. 
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Extended Data Figure 9 | Knocking down fkh-9 via RNAi or using 
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Targeting PTPRK-RSPO3 colon tumours promotes 
differentiation and loss of stem-cell function 


Elaine E. Storm!, Steffen Durinck?, Felipe de Sousa e Melo!, Jarrod Tremayne?, Noelyn Kljavin!, Christine Tan‘, Xiaofen Ye’, 
Cecilia Chiu’, Thinh Pham®, Jo-Anne Hongo’, Travis Bainbridge’, Ron Firestein®, Elizabeth Blackwood’, Ciara Metcalfe’, 
Eric W. Stawiski*, Robert L. Yauch®, Yan Wu4 & Frederic J. de Sauvage! 


Colorectal cancer remains a major unmet medical need, prompting 
large-scale genomics efforts in the field to identify molecular drivers 
for which targeted therapies might be developed'~>. We previously 
reported the identification of recurrent translocations in R-spondin 
genes present ina subset of colorectal tumours*. Here we show that 
targeting RSPO3 in PTPRK-RSPO3-fusion-positive human tumour 
xenografts inhibits tumour growth and promotes differentiation. 
Notably, genes expressed in the stem-cell compartment of the 
intestine were among those most sensitive to anti-RSPO3 treatment. 
This observation, combined with functional assays, suggests that a 
stem-cell compartment drives PTPRK-RSPO3 colorectal tumour 
growth and indicates that the therapeutic targeting of stem-cell 
properties within tumours may be a clinically relevant approach 
for the treatment of colorectal tumours. 

Molecular characterization of colorectal cancer (CRC) has revealed 
that the vast majority of tumours exhibit Wnt pathway activation that 
is largely driven through mutation of downstream signalling compo- 
nents!. Despite this knowledge, the development of therapies targeting 
the Wnt pathway has been challenging and CRC remains one of the 
top three most prevalent and deadly cancers®. We previously reported 
the identification of recurrent gene fusions in RSPO2 and RSPO3 in 
colorectal tumours*. R-spondin fusion tumours exhibited Wnt pathway 
activation, but did not contain common Wnt pathway mutations, such 
as APC’. R-spondins are known to amplify Wnt signalling®*, suggest- 
ing that elevated expression resulting from translocation may drive 
Wnt-dependent tumour growth. 

To test this hypothesis, we generated specific, function-blocking anti- 
bodies against RSPO2 and RSPO3 (Extended Data Fig. 1). We next 
screened patient-derived xenograft (PDX) samples and identified two 
RSPO3-fusion models, CRCA and CRCB, that harbour the PTPRK(e1)- 
RSPO3(e2) gene fusion*. These models do not contain common down- 
stream mutations in the Wnt pathway, but do harbour BRAF(V600E), 
PIK3CA(E545K), SMAD4(D537E) and TP53(I195T) mutations 
(CRCA) and KRAS(G12V), PIK3CA(E545K), SMAD4(C499Y) and 
TP53(W53*) mutations (CRCB). Both models express elevated RSPO3, 
but do not express RSPO1, 2 or 4 (Extended Data Fig. 2). This is similar 
to what is observed in RSPO3-fusion colon tumours, but is in con- 
trast to the normal colon where both RSPO2 and RSPO3 are expressed 
(Extended Data Fig. 2). 

Treatment with anti-RSPO3 inhibited tumour growth in both models, 
demonstrating that the oncogenic driver of these tumours is RSPO3 
(Fig. 1a). Notably, tumours continued to grow for about one week, fol- 
lowed by either stasis (CRCA), or regression (CRCB), which persisted 
for at least 30 days (Extended Data Fig. 2). To characterize the static 
response of CRCA, we analysed end-of-study samples. Tumours from 
the control group continued to proliferate with evidence of secretory 
cell differentiation (Fig. 1b). By contrast, anti- RSPO3-treated tumours 


exhibited large regions primarily composed of mucus. Epithelial cells 
that remained had a differentiated appearance and strongly reduced 
Ki67 positivity (Fig. 1b, arrows). The accumulation of mucus suggests 
measurements may be overestimating tumour content and that stasis 
may be the result of differentiation. 

To gain insight into the mechanism through which anti-RSPO3 pro- 
moted tumour growth inhibition, we first performed histopathological 
analysis 4 days after treatment initiation. Tumours from both models 
were similarly organized and highly proliferative, with ongoing differen- 
tiation, as revealed by KRT20 and MUC2 staining (Extended Data Fig. 3). 
There were some apparent differences between the models. MUC2 
staining was largely extracellular in CRCA, whereas intracellular MUC2 
was apparent in CRCB (Extended Data Fig. 3, arrows). In addition, Ki67 
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Figure 1 | Anti-RSPO3 inhibits tumour growth of RSPO3-fusion- 
positive PDX models. a, Tumour growth inhibition in CRCA and CRCB 
following treatment with antibodies at 30 mgkg ', twice a week, for three 
(CRCA) or four (CRCB) weeks. Closed circles, anti-ragweed treated. 
Open circles, anti-RSPO3-treated. Data represented as means + s.e.m. 
n=9 CRCB ragweed and n= 10 for all other groups. b, Sections stained 
as indicated from representative tumour samples of CRCA collected at 
the end of study. Arrows indicate tumour cells that appear differentiated. 
Scale bars, 100m. Xenograft experiments were performed at least two 
independent times. H&E, haematoxylin and eosin. 


1Molecular Oncology, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA. 2Molecular Biology, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA. 
3Translational Oncology, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA. Antibody Engineering, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA. 
5Discovery Oncology, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA. Research Pathology, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA. 


7Protein Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, USA. 


7 JANUARY 2016 | VOL 529 | NATURE | 97 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a b ASCL2 AXIN2 
Se i ee 


log,[fold change] CRCB 
> 
Anti-ragweed 


CRCA 


1 I 
1 | 
ASCL2, | | 
I | 
1 | 


T T T T T T T 
6 -4+ -2 0 2 4 6 


log, [fold change] CRCA 


405 . e@ Crypt 
a Differentiated 


7) 
Anti-RSPO3 
‘ 

7! pt 
Z 

—_~ 


30-4  LGRS 


-log,9{q value] 
nN 
i=} 
1 


ASCL2 


T T T T 
6 +4 -2 0 2 4 6 


a 
CRCB 


@ Crypt 


ASCL2 
° @ Differentiated 


—log, gig value] 


Ae 


log, [fold change] (anti-RSPO3/anti-ragweed) 


Figure 2 | Anti-RSPO3 promotes differentiation. a, Scatter plot of the 
expression response of CRCA and CRCB. Grey dots, individual genes. 
Triangles, specific genes. Dotted lines indicate a twofold change in 
expression. b, Representative images of in situ hybridization of ASCL2 
and AXIN2, as indicated. Arrows indicate residual expression. Scale bars, 
20 1m. c, d, Volcano plots indicating differentially expressed genes from 
CRCA (c) and CRCB (d). Genes enriched in the stem/undifferentiated 
compartment of the colon’? are indicated in red. Genes enriched in the 
differentiated compartment of the colon’ are indicated in blue. n= 3 for 
all groups. 


Homeostasis Homeostasis 


Anti-RSPO3 Anti-RSPO2 Anti-ragweed 


Anti-RSPO2 + 3 


98 | NATURE | VOL 529 | 7 JANUARY 2016 


was more uniformly reduced in CRCB following anti-RSPO3 treat- 
ment (Extended Data Fig. 3). To extend these analyses, we generated 
RNA-sequencing data from day 4 samples (Supplementary Tables 1 
and 2). Comparison of the global response of the two models revealed 
notable similarities (Fig. 2a), including the downregulation of Wnt target 
genes, which were statistically enriched in our data set (Extended Data 
Fig. 4). However, the effect on individual Wnt target genes varied. 
For example, AXIN2, MYC and CCND1 were modestly reduced 
(Fig. 2a and Extended Data Fig. 4) and were not present in the top 
100 most downregulated genes among those statistically significantly 
regulated (Supplementary Table 1). In contrast, LGR5 ranked sec- 
ond and ASCL2 ranked among the top five in both models (Fig. 2a, 
Extended Data Figs 4 and 5 and Supplementary Table 1). In situ hybrid- 
ization confirmed the sensitivity of ASCL2, which was expressed in 
most of the undifferentiated cells of the tumours and was markedly 
reduced following anti-RSPO3 treatment (Fig. 2b). In addition to 
LGR5 and ASCL2, which are well-characterized markers of intestinal 
stem cells’, the stem-cell marker genes LRIG1 and TERT were also 
downregulated (Supplementary Table 1). The sensitivity of Wnt target 
genes expressed in stem cells suggest they require RSPO3 to achieve the 
high level of Wnt signalling that is associated with stem-cell activity’®. 
Indeed, it was recently reported that in murine organoids Ascl2 is 
expressed when a threshold of Wnt signalling is reached that is pro- 
moted by R-spondins!!. 

Genes that were upregulated upon anti-RSPO3 treatment included 
markers of differentiation!” (Fig. 2a, c, d, Extended Data Fig. 5 and 
Supplementary Table 1), suggesting that anti-RSPO3 disrupts the 
cellular hierarchy within the tumour. Therefore, we compared the gene 
signatures of CRCA and CRCB with published gene signatures derived 
from the stem-cell and differentiated-cell compartments of the human 
colon!3. In both models, treatment with anti-RSPO3 resulted in the 
downregulation of genes expressed in the stem-cell compartment and 
an upregulation of the genes expressed in differentiated cells (Fig. 2c, d 
and Extended Data Fig. 6). This effect was specific to RSPO3-fusion 
tumours, as anti-RSPO3 treatment had no effect on tumour growth 
or molecular response in two different APC mutant PDX models 
(Extended Data Fig. 7). 

Our data indicate that anti-RSPO3 specifically shifts fusion tumours 
to a more differentiated phenotype, as previously reported in colon 
tumours following Wnt pathway inhibition'*!°. To further explore this 
process, we characterized the molecular response at three different time 
points following a single dose of anti-RSPO3. AXIN2, LGR5 and ASCL2 
were reduced within 24h and LGR5 and ASCL2 remained low through 
Regeneration Figure 3 | RSPO2 and RSPO3 are required for 
normal stem-cell function in the intestine. 
Representative sections of jejunum from animals 
treated with antibodies during homeostasis 
(n= 3) or regeneration (n =5), stained with 
haematoxylin and eosin or anti-GFP. Scale 
bars, 501m. Experiments on normal intestinal 
homeostasis were performed at least twice. 
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10 days. By contrast, markers of differentiation began to increase by 
4 days and continued to increase through 10 days (Extended Data 
Fig. 8). 

The immediate and robust effect of anti-RSPO3 on stem-cell markers, 
combined with reports indicating that R-spondin overexpression 
expands the intestinal stem-cell compartment!” and is required for 
culture of intestinal stem cells!’, led us to hypothesize that RSPO3 reg- 
ulates stem-cell function. To investigate this possibility, we turned to 
the normal intestine, where stem-cell biology is well characterized’. 
Lgr5 marks stem cells in the normal intestine that are sensitive to 
R-spondin overexpression!”"?. While ablation of Lgr5-positive cells 
does not have an acute effect on homeostasis, these cells are required for 
regeneration following irradiation injury”””!. We therefore examined 
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the effect of blocking R-spondins in the intestine during homeosta- 
sis and regeneration. As the normal intestine expresses both RSPO2 
and RSPO3 (Extended Data Fig. 2), we included an anti-RSPO2/anti- 
RSPO3 combination group in the study. Similar to the effect of Lgr5* 
cell ablation?°”!, there was minimal impact of antibody treatment 
during homeostasis, but a profound impact on regeneration follow- 
ing irradiation (Fig. 3 and Extended Data Fig. 9). Strong effects were 
observed in the combination group where Lgr5~ cells were difficult to 
identify and the regeneration of the epithelium following irradiation 
was markedly impaired (Fig. 3). The severity of the phenotype of the 
combination group suggests there is functional redundancy between 
Rspo2 and Rspo3 in the normal intestine not present in RSPO3- 
fusion-positive tumours. 

To address whether anti-RSPO3 reduces stem-cell function in 
tumours, we performed a serial transplantation study. Mice implanted 
with CRCA tumours were treated with either control or anti-RSPO3 
antibody (Fig. 4a) and tumour fragments from each group were trans- 
planted into naive recipient mice, which were subsequently treated with 
either control or anti-RSPO3 antibody. Transplanted tumours from 
control animals engrafted at a 97% (31/32) rate and initiated robust 
tumour growth following transplantation (Fig. 4b, c). By contrast, 
the engraftment rate was only 44% (14/31) from anti-RSPO3-treated 
tumour fragments and these tumours grew more slowly (Fig. 4b, c). 
Continuation of anti-RSPO3 treatment following transplantation 
profoundly impacted tumour engraftment and growth, with regression 
occasionally being observed in both groups (2/8 control pre-treated, 
3/4 anti-RSPO3 pre-treated) (Fig. 4b, c). To test whether the reduction 
in tumour engraftment was accompanied by a decrease in tumour- 
initiating cell content, we performed a flow cytometric analysis using 
CD133 and CD44, two proteins previously identified as markers for 
tumour-initiating cells in colon cancer”*3. Anti-RSPO3 reduced 
the number of CD133* and CD44" cells in both models (Extended 
Data Fig. 10). 

Cells with stem-cell properties have been identified in multiple 
tumour types and have formed the basis of a ‘cancer stem cell hypothesis’ 
where a population of cells within a tumour has enhanced long-term 
tumour-propagating potential and can promote relapse following 
therapy cessation”*~*°. It remains to be established whether a stem- 
cell compartment could drive colorectal tumour growth and whether 
targeting these cells could provide an effective therapeutic strategy. 
Our data, along with published data on the role of RSPOs in stem-cell 
biology’’ highlight a role for RSPO3 in regulating stem-cell function 
in PTPRK(e1)-RSPO3(e2)-fusion-positive tumours. RSPO2 may have a 
similar role in tumours where fusions have been identified*”’. However, 
the role for RSPO2 as a driver could be more complex as it has been 
proposed to be a tumour suppressor in other contexts”*, 

Molecular characterization of CRC has revealed that virtually all 
tumours exhibit aberrant Wnt signalling!. Wnt is a known regulator 
of stem-cell hierarchy with the highest level of activity associated 
with stem-cell properties and tumour-initiating ability?"!!. Anti- 
RSPO3 reduced, but did not eliminate Wnt activity, consistent with 
being a ligand-dependent amplifier of pathway activation® *. This 
reduction was associated with tumour growth inhibition, suggest- 
ing that RSPO3-fusion tumours depend on the high level of Wnt 
activity associated with stem cells and promoted by R-spondins!'. 
Correspondingly, direct inhibition of Wnt ligand production was 
recently reported to promote differentiation in RSPO3-fusion 
tumour models!*. However, our data in the normal intestine suggest 
that inhibiting Wnt signalling during regeneration may not be toler- 
ated (Fig. 3). The functional reduncancy of the R-spondins present 
in the normal intestine creates a unique opportunity to specifically 
target the oncogenic driver of RSPO3-fusion tumours with fewer 
safety concerns. 


Online Content Methods, along with any additional Extended Data display items and 


Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Antibody affinity. Binding affinities of anti-RSPO antibodies were measured by 
surface plasmon resonance (SPR) using a BIAcoreTM-2000 instrument. The CM5 
biosensor chip was activated with N-ethyl-N’-(3-dimethylaminopropyl)carbodiimide 
hydrochloride (EDC) and N-hydroxysuccinimide (NHS) reagents according to 
the supplier’s (GE Healthcare Biosciences) instructions. RSPO antigens (human 
RSPO2 and human RSPO3 (Genentech, Inc.); human RSPO1, human RSPO4, and 
mouse RSPOs (R&D Systems) were immobilized onto the biosensor chip to achieve 
approximately 30 response units (RU), followed by blocking with 1 M ethanola- 
mine. For kinetic measurements, fourfold serial dilutions of anti-RSPO antibodies 
were injected in a range of 0.0976-100 nM HBS-P buffer (0.01 M HEPES pH 7.4, 
0.15 M NaCl, 0.005% surfactant P20) at 25°C with a flow rate of 30] min7!. 
Association rates (ko) and dissociation rates (Kor) were calculated using a 
simple one-to-one Langmuir binding model (BlAcore Evaluation T200 Software 
version 2.0). The equilibrium dissociation constant (Kg) was calculated as the 
ratio kog/kon- 

Antibody activity. 293T cells (Genentech cell bank, authenticated by STR pro- 
filing, SNP fingerprinting and mycoplasma tested) were reverse transfected and 
plated in 10011 of DMEM containing 2.5% fetal bovine serum under the follow- 
ing conditions per well: 9,000 cells, 0.04 1g Topbrite 25 plasmid, 0.02 jug SV-40 
Renilla plasmid, 0.25 11 Fugene 6 (Promega). Following 16-20h of culture, cells 
were stimulated with 25 11 ofa 5x Wnt/RSPO solution in DMEM 10% fetal bovine 
serum. Cells were stimulated with a final concentration of 10 ng ml”! rmWnt3a 
(R&D Systems) and the ECso of RSPO2 (11.6 pM, Genentech, Inc.), or the ECso of 
RSPO3 (10.5 pM, Genentech, Inc.) + anti-RSPO2 or anti-RSPO3 for 6h at 37°C. 
Following, luciferase activity was detected using the Promega Dual-Glo system 
(Promega) according to the manufacturer's instructions. Data were analysed as a 
ratio of Firefly/Renilla (RLU WNT reporter). 

Animal studies. All studies involving animals were approved by Genentech’s 
Institutional Animal Care and Use Committee and adhere to the NRC Guidelines 
for the Care and Use of Laboratory Animals. For xenograft studies, animals were 
randomized into treatment groups based on starting tumour volumes, when 
tumours reached a mean volume of approximately 150-300 mm?. Animals were 
humanely euthanized according to the following criteria: clinical signs of persis- 
tent distress or pain, significant body weight loss (>20%), tumour size exceeding 
2,000 mm, or when tumours ulcerate. Maximum tumour size permitted by the 
Institutional Animal Care and Use Committee is 3,000 mm? and in none of the 
experiments was this limit exceeded. 

Patient-derived xenograft studies. PDX studies were performed by implanting 
primary tumour fragments subcutaneously in Balb/C nude mice. Animals were 
distributed into treatment groups (n= 10 per group) when tumours reached a 
mean volume of approximately 150-300 mm’. Control antibody (anti-ragweed; 
30 mgkg’) or anti-RSPO3 (30 mgkg ') was injected in the intraperitoneal space 
twice a week for 3-4 weeks. Tumour size and body weight measurements were col- 
lected twice a week. Tumour volume was determined using digital callipers (Fred 
V. Fowler Company, Inc.) using the formula / x w*, where Lis length and w is width. 
Mice with tumour volumes exceeding 2,000 mm’, or that had lost >20% of their 
initial bodyweight were humanely euthanized. Experiments were double-blinded. 
Homeostasis experiments. The effects of antibodies on intestinal stem cells during 
homeostasis was explored using LGR5-DTR mice”! (n=3 per group). Males and 
females 12-16 weeks were used. Animals were injected with antibody (30 mg kg!) 
twice a week and samples of intestine were collected on day 4 or day 9. 
Regeneration experiments. Regeneration experiments were performed on 
6-12 week Balbc/J female mice (n=5 per group). Mice were injected with antibod- 
ies (30 mgkg~') on days 0, 3 and 6 and subjected tol0 Gy whole-body irradiation 
on day 3. On day 7, 4 days following irradiation, the mice were humanely eutha- 
nized and samples of intestine collected for histology. Homeostasis and regenera- 
tion experiments were not blinded. 

Histology. Samples were formalin-fixed and paraffin-embedded using standard 
procedures. For Ki67 antibody stains, rehydrated sections were pressure-cooked 
for 15 min in antigen unmasking buffer (DAKO), blocked in serum-free protein 
block (DAKO) and incubated in anti-Ki67 (1:400, Sigma) overnight. Sections were 
then incubated in HRP-conjugated anti-rabbit (DAKO) and detected with DAB 
reaction (DAKO). GFP in LGR5-DTR mice was detected in frozen sections with 
anti-GFP (Torrey Pines) followed by Alexa Fluor 488 (Life Technologies). 
Tumour dissociation and flow cytometry. Tumour-bearing mice were treated 
with anti-ragweed or anti-RSPO3 at 30 mgkg! for 14 days. Tumours were col- 
lected from n=5 mice and digested enzymatically for 30 min with a mixture of 
collagenase (1.5mg ml/; Roche) and hyaluronidase (20 bg ml~!) at 37°C. The 
cells were then filtered (701m pore size) and washed. Cell debris was removed by 
lympholyte (M; Cedarlane) centrifugation or ACK lysis. The cell suspensions were 
stained with the following antibodies for flow cytometry analysis: AC133 (Miltenyi 
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Biotec, 1:100), CD44 (BD Biosciences, 1:100). Dead cells were excluded with Sytox 
blue (1:1,000, Invitrogen). Gating of singlet live cells was used to determine the 
percentage of CD133 or CD44 within each independent tumour. 

RNA isolation and PCR. RNA from samples was isolated using a RNeasy 
Plus kit according to the manufacturer's instructions (Qiagen). (RT-PCR was 
performed in 10 1] reactions using 50 ng total RNA using One-step Real-time 
RT-PCR mastermix (Life Technologies) according to the manufacturer's instruc- 
tions. The following Taqman assays from Life Technologies were used: AXIN2 
(Hs00610244_m1), ASCL2 (Hs00270888_s1), LGR5 (Hs00969422_m1), CA4 
(Hs00426343_m1), MYC (Hs00153408_m1), CEACAM7 (Hs03988977_m1). 
Identification of RSPO3 fusions was performed using a PTPRK(e1)-RSPO3(e2), 
specific Taqman assay: forward (TCTCCTTGGCCTCTCCTGGGAT), reverse 
(TTGGCAGCCTTGACTAACGTT), probe (ITTCTCCGCAGTGCATC). 
RNA-seq. RNA-seq libraries were prepared using TruSeq RNA Sample Preparation 
kit (lumina, CA). The libraries were sequenced on Illumina HiSeq 2500 sequenc- 
ers and we obtained on average 34 million single-end reads (50 bp) per sample. 
RNA-seq reads were aligned to the human genome version NCBI GRCh37 
using GSNAP. Expression counts per gene were obtained by counting the num- 
ber of reads aligned concordantly within a pair and uniquely to each gene locus 
as defined by NCBI, Ensembl gene annotations, and RefSeq mRNA sequences. 
Differential gene expression analysis was performed using edgeR. Gene enrich- 
ment analysis was performed on the edgeR differential expression results using the 
GseaPreranked tool available through the Broad’s GSEA application. The colon 
crypt gene sets used for GSEA analysis were derived from the paper by Kosinsky 
et al.', Wnt target gene sets were derived from the paper of Fevr et al.””. DESeq 
was used to compute the variance stabilized expression values for plotting the 
expression heat maps. 

In situ hybridization. Non-isotopic in situ hybridization was performed using 
probes from Affymetrix to human ASCL2 (catalogue no. VA1-17147), and 
AXIN2 (no. VA1-10388) with a probe set to Bacillus subtilis dihydropicolinate 
reductase (no. VF-11712) used as a negative control. A modified, manual version 
of the ViewRNA eZ Detection Kit (Affymetrix) was used for amplification and 
detection of hybridized signal. In brief, 4-j1m formalin-fixed paraffin-embedded 
sections were deparaffinized in Xylenes (Richard Allen Scientific) and then 
air-dried. Sample pre-treatment was done by incubating slides in 1x Affymetrix 
Pretreatment Solution for 15 min at 99°C in a PT Module (Thermo Fisher) then 
digested in Protease QF (Affymetrix) diluted 1:120 in PBS for 25 min at 40°C. 
Slides were then post-fixed in 4% paraformaldehyde and blocked with Bloxall 
(Vector Labs) before hybridization with probes diluted 1:65 for 2.5h at 40°C. 
Branched DNA amplification was done by incubating in Amp1 solution for 25 min, 
Amp2 solution for 18 min and Amp3 solution for 18 min all at 41.7°C then Label 
Probe-AP solution at 1:1,000 dilution for 15 min at 40°C. Hybridization was 
visualized using Warp Red substrate (Biocare) and samples were counterstained in 
haematoxylin. Slides were imaged using a Nanozoomer XR (Hamamatsu). 
Exome sequencing. Sequencing reads were mapped to a combined genome 
with both the UCSC human genome (GRCh37/hg19) and the mouse genome 
(mm9) using BWA software*? set to default parameters. Because of potential 
mouse stromal contamination, using a combined genome has previously been 
shown to increase the overall variant calling accuracy when analysing xeonograft 
exome data*!. Local realignment, duplicate marking were performed as described 
previously. Reads unambiguously mapped to the human genome were used for 
variant calling using the Strelka** program. Known germline variants represented 
in dbSNP Build 131 (ref. 34) or 6,515 previously published normal exomes*», but 
not represented in COSMIC v62 (ref. 36), were removed. 


29. Fevr, T., Robine, S., Louvard, D. & Huelsken, J. Wnt/3-catenin is essential for 
intestinal homeostasis and maintenance of intestinal stem cells. Mol. Cell. Biol. 
27, 7551-7559 (2007). 

30. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows— 
Wheeler transform. Bioinformatics 25, 1754-1760 (2009). 

31. Tso, K. Y., Lee, S. D., Lo, K. W. & Yip, K. Y. Are special read alignment strategies 
necessary and cost-effective when handling sequencing reads from 
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protein-coding variants. Nature 493, 216-220 (2013). 
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Extended Data Figure 1 | Characteristics of anti-RSPO2 and anti- 
RSPO3 antibodies. a, Surface plasmon resonance sensograms of 
anti-RSPO2 (0-125 nM) binding to human and mouse R-spondins (as 
indicated; h, human; m, mouse). b, Surface plasmon resonance sensograms 
of anti-RSPO3 (0-125 nM) binding to human and mouse R-spondins (as 
indicated). c, Affinities of anti-RSPO2 and anti-RSPO3 binding to human 
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Extended Data Figure 2 | Tumour growth inhibition persists following 
anti-RSPO3 treatment cessation. a, RNA-seq expression of RSPO1-4 and 
identification of RSPO3-fusion by fusion-specific quantitative reverse 
transcription PCR (qRT-PCR) in colon tumour, normal and PDX models. 
Expression data from normal colon and colon tumours were analysed from 
the publicly available data EGA accession EGAS00001000288 (ref. 4). Data 
are mean +s.d. Normal n= 72; non-fusion tumour n= 67; PTPRK(e1)- 
RSPO3(e2) n=4; CRCA n=3 different tumours from the model; CRCB 
n= 3 different tumours from the model. RPKM, reads per kilobase per 


million. Expression levels greater than 0 but less than 0.5 RPKM are 
graphed as 0.5. b, Individual tumour measurements from CRCA treated 
with anti-RSPO3 (30 mg kg!) twice a week for 4 weeks. Tumours were 
monitored for 30-40 days following the last dose of antibody (m= 10). 
c, Individual tumour measurements of CRCB data presented in Fig. 1. 
Anti-ragweed, n = 9; anti-RSPO3, n= 10. Tumour growth was followed 
for an additional 30-40 days following the last dose of antibody. Closed 
circles, anti-ragweed. Open circles, anti-RSPO3. Black bar indicates the 
time during which mice were treated with antibodies. 
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Extended Data Figure 3 | Histopathological analysis of tumours from dosing and stained as indicated. CC3, cleaved caspase 3. Black arrows 
CRCA and CRCB. Sections from CRCA and CRCB (representative highlight extracellular MUC2 positivity, red arrows highlight intracellular 
tumour from n= 3 analysed) collected 4 days following the initiation of MUC2 positivity. Scale bar, 100 jm. 
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Extended Data Figure 4 | Effects of anti-RSPO3 treatment on Wnt 
pathway activity. a, Heat map of aCTNNB1 gene signature”’ in CRCA 
samples, n = 3 tumours per group. b, Heat map of a CTNNB1 gene 
signature’’ in CRCB, n =3 tumours per group. c, Gene set enrichment 


analysis of CRCA using a CTNNBI gene signature”. RW, anti-Ragweed- 
treated; R3, anti-RSPO3-treated; NES, normalized enrichment score; 
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FWER, family-wise error rate. d, Gene set enrichment analysis of CRCB 
using a CTNNB1 gene signature”’. e, f, RNA-seq expression of a subset 
of Wnt target genes in CRCA (e) and CRCB (f). Data are normalized to 
control group. Horizontal line indicates the mean. Closed circles, anti- 
ragweed treated. Open circles, anti-RSPO3 treated. n = 3 tumours. 
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Extended Data Figure 5 | Validation of gene expression changes by tumours. Open circles, anti-RSPO3-treated tumours. Horizontal line 
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Extended Data Figure 6 | Anti-RSPO3 treatment promotes differentiation. a, Gene set enrichment using the Kosinsky et al. gene set’ in CRCA. 
b, Gene set enrichment using the Kosinsky et al. gene set'? in CRCB. n = 3 per group. RW, anti-ragweed-treated; R3, anti-RSPO3-treated; 


NES, normalized enrichment score. 
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Extended Data Figure 7 | APC mutant PDX models fail to respond with antibodies (30 mgkg =!) for 3-4 weeks (n = 10 animals per group). 
to anti-RSPO3 treatment. Tumour growth inhibition (left) and Tumour growth data were normalized to initial tumour volume upon 
qRT-PCR of target genes (right) in APC mutant PDX models. treatment (100-150 mm?). qRT-PCR data (n= 5) were collected at 
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mutant PDX model. n= 10, anti-ragweed treated, n = 9, anti-RSPO3- mean +s.e.m. Closed circles, anti-ragweed. Open circles, anti-RSPO3. 
treated. b, APC(757fs, E1408*);KRAS(G12D) model. Mice were treated UND, undetermined C, value. 
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Extended Data Figure 8 | The downregulation of stem-cell marker gene Closed circles, anti-ragweed-treated. Open circles, anti-RSPO3-treated. 
expression is an early and robust response to anti-RSPO3 treatment. Horizontal line indicates the mean. nm =5 tumours per group per time 
a, Relative gene expression in CRCA tumour samples collected 1, 4 point. UND, undetermined C, value. 


and 10 days following a single injection of anti-RSPO3 (30 mg kg). 
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Extended Data Figure 9 | Combined treatment of anti-RSPO2 and anti- 
RSPO3 reduces LGRS. a, Quantification of Lgr5* cells in the jejenum of 
mice treated with antibody (indicated on x axis) for 4 days. Lgr5-GFP- 
positive crypts/total crypts (n= 10 for control, n = 20 for antibody treated) 
were counted in three different mice. Horizontal line indicates the mean. 
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b, Relative expression assessed by qRT-PCR from RNA isolated from the 
ileum of animals treated with antibody for 9 days. Data are normalized 
to the control antibody treated group. n =5, horizontal line indicates the 
mean. c, Haematoxylin and eosin stained sections from the intestine of 
mice treated with indicated antibodies for 9 days. Scale bar, 100 pm. 
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positive cells as a percentage of live cells following 8 days of anti-RSPO3 
treatment of CRCB. Closed circles, anti-ragweed treated. Open circles, 
anti-RSPO3 treated. Horizontal line indicates the mean. n=5. 


Extended Data Figure 10 | Anti-RSPO3 treatment reduces tumour- 
initiating cell content. a, Percentage of CD133 (left) and CD44 (right)- 
positive cells as a percentage of live cells following 14 days of anti-RSPO3 
treatment of CRCA. b, Percentage of CD133 (left) and CD44 (right)- 
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Species difference in ANP32A underlies influenza A 
virus polymerase host restriction 


Jason S. Long!, Efstathios S. Giotis!, Olivier Moncorgé?, Rebecca Frise!, Bhakti Mistry’, Joe James’, Mireille Morisson‘, 
Munir Iqbal’, Alain Vignal*, Michael A. Skinner! & Wendy S. Barclay! 


Influenza pandemics occur unpredictably when zoonotic 
influenza viruses with novel antigenicity acquire the ability to 
transmit amongst humans}. Host range breaches are limited by 
incompatibilities between avian virus components and the human 
host. Barriers include receptor preference, virion stability and 
poor activity of the avian virus RNA-dependent RNA polymerase 
in human cells”. Mutants of the heterotrimeric viral polymerase 
components, particularly PB2 protein, are selected during 
mammalian adaptation, but their mode of action is unknown?°, 
We show that a species-specific difference in host protein ANP32A 
accounts for the suboptimal function of avian virus polymerase 
in mammalian cells. Avian ANP32A possesses an additional 33 
amino acids between the leucine-rich repeats and carboxy-terminal 
low-complexity acidic region domains. In mammalian cells, avian 
ANP32A rescued the suboptimal function of avian virus polymerase 
to levels similar to mammalian-adapted polymerase. Deletion of 
the avian-specific sequence from chicken ANP32A abrogated this 
activity, whereas its insertion into human ANP32A, or closely related 
ANP32B, supported avian virus polymerase function. Substitutions, 
such as PB2(E627K), were rapidly selected upon infection of humans 
with avian H5N1 or H7N9 influenza viruses, adapting the viral 
polymerase for the shorter mammalian ANP32A. Thus ANP32A 
represents an essential host partner co-opted to support influenza 
virus replication and is a candidate host target for novel antivirals. 

The negative sense RNA genome of influenza virus is delivered into 
host cells associated with the viral RNA-dependent RNA polymer- 
ase, a heterotrimeric complex of PB1, PB2 and PA proteins and the 
viral nucleoprotein NP. The viral ribonucleoprotein (vVRNP) complex 
traffics to the cell nucleus where, in complex with various co-opted 
host factors, the viral polymerase directs transcription and replication 
of the genome’. This intimate virus—host interaction is suboptimal 
for an avian influenza virus in mammalian cells, such that productive 
replication or onwards transmission do not occur until adaptive muta- 
tions are selected***”. The host range restriction of influenza A virus 
polymerase can be recapitulated in vitro, whereby functional viral pol- 
ymerase, reconstituted by expression of PB1, PB2, PA and NP, directs 
amplification and expression of a model viral-like RNA!°. In mamma- 
lian cells, the low activity of reconstituted avian virus polymerase can 
be significantly increased by a single amino acid substitution (E627K) 
in PB2!"!?. In contrast, in avian cells, polymerases bearing either E or 
K at PB2 position 627 are active'''?. Influenza A viruses that circu- 
late in humans either have lysine at PB2 627 or, as in 2009 pandemic 
HINI1 virus, substitutions at residues 271, 590 and 591 (refs 14, 15) that 
map close to 627 on the polymerase structure!®, suggesting a common 
mechanism of host adaptation. 

We showed previously that heterokaryons formed by fusing human 
and avian cells supported avian virus polymerase activity, suggesting 
that host range restriction was explained by a species-specific difference 


in a positive avian host factor that was either lacking or different in 
mammals’”. To identify this factor, we took advantage of a chicken 
genome radiation hybrid panel based on the Wg3H hamster cell line 
and harbouring different fragments of chicken chromosomes!’. Each 
radiation hybrid clone was screened for the ability to support activity of 
a H5N1 avian influenza virus polymerase (A/turkey/England/50-92/91 
virus, hereafter called 50-92) compared with its human-adapted 
isoform bearing PB2(E627K). Four out of 53 clones supported activ- 
ity of the avian virus polymerase (Fig. 1a). Owing to instability of the 
chicken chromosome content, one positive clone lost this phenotype 
by passage 12. To identify chicken genes expressed in the positive 
radiation hybrid clones but not in negatives, total RNA was analysed 
using an Affymetrix chicken gene array. The four early passage posi- 
tive clones had 35 chicken genes in common (Fig. 1b). After P12, the 
remaining positive hybrids shared only 12 genes in common, all present 
on a region of chicken chromosome 10 (Fig. 1c and Extended Data 
Table 1). Genetic instability of the clones is revealed by principal 
component analysis (PCA) mapping analysis, as is variation between 
negative clones (Extended Data Fig. 1). 

We targeted candidate genes within this region and co-expressed 
each individually in human 293T cells with reconstituted H5N1 50-92 
influenza virus polymerase. Expression of a single chicken gene, 
chANP32A (chicken acidic (leucine-rich) nuclear phosphoprotein 
32 family, member A), rescued avian virus polymerase activity to levels 
similar to human-adapted PB2 627K viral polymerase (Fig. 1d). All 
positive hybrid clones expressed chANP32A (Extended Data Fig. 2). 
Knockdown of chANP32A abrogated the ability of positive radiation 
hybrid clone 476 to support avian virus polymerase, suggesting that this 
single chicken gene was responsible for the phenotype (Extended Data 
Fig. 3). Chicken homologues of DDX17 and members of the importin 
alpha family previously implicated in host range restriction of influ- 
enza virus polymerase did not rescue avian virus polymerase activity 
in human cells (Fig. 1d)!89, 

ANP32A is a member of a family of nuclear proteins implicated 
in multiple cellular pathways, including transcriptional regulation 
by chromatin remodelling, messenger RNA export and cell death”°. 
ANP32 proteins share conserved amino-terminal domains, com- 
prised of leucine rich repeats (LRR) and C-terminal low complexity 
acidic regions (LCAR) comprised of 60-75% glutamic or aspartic acid 
(Fig. 4a and Extended Data Fig. 8). We expressed cloned human homo- 
logue (huANP32A) as well as human and chicken genes for the related 
family member ANP32B. Expression of chANP32A, but not huANP32A, 
in human cells rescued activity of several different avian influenza 
polymerases including those from low (H1N1 A/duck/Bavaria/79 
(Bav), H9N2 A/UDL/2008 (UDL)), and high (H5N1 (A/turkey/ 
Turkey/05/2005 (Ty05) and A/turkey/England/50-92/91 (50-92)) path- 
ogenicity avian viruses, and a human H3N2 virus (A/Victoria/3/75 
(Vic)) when PB2 bore 627E (Extended Data Fig. 4). Expression of 
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Figure 1 | Identification of a positive avian cellular factor that permits 
avian influenza virus polymerase activity in mammalian cells by 
screening radiation hybrid clones. a, Radiation hybrid (RH) clones 
transfected with mouse-poll-firefly minigenome reporter, avian influenza 
virus H5N1 50-92 polymerase (either PB2 627E or PB2 627K) and Renilla 
expression control. Passage 1 clones black bars, passage 12 clones grey 
bars. (Data as a ratio of PB2 627E over PB2 627K polymerase activity 
(firefly normalized to Renilla); n=3 biological replicates; error as s.e.m. 
of the ratio; one-way ANOVA, comparisons to Wg3h, ****P < 0.0001). 
b, Venn diagram of microarray data analysis of P1 positive hybrid clones 
vs parent Wg3h cells. c, Venn diagram of P12 positive hybrid clones vs 
parent and 365P1 (positive RH clone) vs 365P12 (reverted RH clone) 
(total gene numbers in brackets); Two-way ANOVA (variables: clone 


chANP32A, but not hoANP32A, chANP32B or huANP32B, increased 
levels of all three species of minigenome RNA produced by reconsti- 
tuted avian viral polymerase in human cells (Fig. 2a—c). Increase in 
levels of viral genomic RNA (vRNA) and complementary antigenomic 
RNA (cRNA) in the polymerase assay indicates that RNA replication, 
and not only mRNA transcription or stability, was enhanced by the 
expression of the chicken host factor in mammalian cells. Expression 
of chANP32A in human cells did not affect levels or nuclear accumu- 
lation of PB2, suggesting that its ability to support polymerase activity 
in the cell nucleus is not mediated by enhanced nuclear trafficking 
(Extended Data Fig. 5). 
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and passage number) adjusted by Benjamini-Hochberg multiple-testing 
correction (false discovery rate (FDR) of P < 0.05). Statistically significant 
genes identified and those with fold-change values < + 1.5 removed. 

d, Cloned chicken genes from chromosome 10 or chicken homologues of 
genes previously implicated with PB2 host range (underlined) expressed 
in 293T cells for 20h before transfection with pHOM1-firefly minigenome 
reporter, avian virus 50-92 polymerase (either PB2 627E (Black bars) or 
PB2 627K (grey bars)) and Renilla expression control. Luciferase activity 
was assayed after a further 24h. (Data are firefly activity normalized to 
Renilla; n= 3 biological replicates; error as s.e.m.; one-way ANOVA, 
comparisons to empty vector, ****P < 0.0001, pattern of results consistent 
in at least three independent experiments.). 


Additionally, expression of chANP32A enhanced avian virus repli- 
cation in human cells. Levels of all three influenza RNA species derived 
from avian virus segment 1 were increased by expressed chANP32A 
but not by hu ANP32A (Fig. 2d-f). Moreover, the yield of infectious 
avian-like influenza virus was enhanced whether chANP32A was 
expressed in human cells at 33°C (Fig. 2g) or 37°C (Extended Data 
Fig. 4c). In contrast, ch ANP32A did not enhance yields of an isogenic 
PB2 627K recombinant virus, and overexpression of huANP32A 
reduced its titre (Fig. 2g and Extended Data Fig. 4c). 

We confirmed a role for ANP32A in supporting virus replication in 
avian cells. In chicken cells (DF-1, immortalized chicken fibroblasts), 


Figure 2 | Expression of chANP32A in human cells rescues 
transcription and replication of avian influenza virus. a—c, ch or 
huANP32A, ANP32B or empty vector were expressed for 20 h before 
transfection of avian virus 50-92 polymerase (either PB2 627E (black 
bars) or PB2 627K (grey bars)), and incubated at 37°C for 24h before 
reverse transcription followed by quantitative PCR (qRT-PCR) for 
luciferase gene of the viral minigenome reporter, VRNA (a), mRNA (b) 
and cRNA (c). d-f, 293T cells expressing chANP32A, huANP32A or 
empty vector for 20h before infection by avian influenza virus (H9N2 
A/chicken/UDL-01/2008) virus (multiplicity of infection (MOI) 1.0), 
incubated at 37°C for 24 h before qRT-PCR for viral segment 1; vRNA (d), 
mRNA (e) and cRNA (f). (Data expressed as fold change to empty vector, 
normalized to 18S RNA, calculated by AAC}; n = 3 biological replicates; 
error plotted as s.e.m.; one-way ANOVA, comparisons to empty vector; 
NS= not significant, *P < 0.05, **P< 0.01, ***P < 0.001, ****P < 0.0001; 
pattern of results consistent in at least three independent experiments.). 
g, 293T cells expressing ch or hANP32A or empty vector for 20 h before 
infection (MOI 0.1) with avian-like influenza virus (recombinant PR8 
virus with H5N1 Ty05 polymerase, M and NS gene segments and PR8 
HA and NA genes) bearing PB2 627E (black bars) or PB2 627K (grey 
bars). Cells incubated at 33°C and cell supernatant titrated for infectious 
virus 24 h post infection on MDCK cells by plaque assay. (Data displayed 
as logio(plaque forming units per ml); n =3 biological replicates; error 
plotted as s.e.m.; one-way ANOVA, comparisons to empty vector; NS, not 
significant, **P < 0.01 ****P < 0.0001; pattern of results consistent in at 
least three independent experiments). 
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Figure 3 | Knockdown of ANP32 reveals avian influenza polymerase 
dependence on chANP32A and dependence on huANP32A and B in 
human cells by human-adapted influenza polymerase. a, DF-1 cells 
transduced with vesicular stomatitis virus (VSV)-G lentiviral vectors 
delivering transgenes expressing puromycin and shRNA targeting 
chANP32A or negative. Puromycin-selected cells were transfected with 
pCOM1-firefly minigenome reporter, avian 50-92 polymerase (627E) and 
Renilla expression control. b, siRNA (100 nM) applied to DF-1 chANP32A 
shRNA cells. After 48 h, cells transfected with avian 50-92 polymerase 
(627E), minigenome reporter and Renilla expression control. Luciferase 
activity measured 20 h later. Knockdown in chicken DF-1 cells verified by 
immunoblotting using antibody against vinculin and chANP32A. c, DF-1 
cells depleted of chANP32A by siRNA infected with avian-like influenza 
virus (PR8 virus bearing H5N1 Ty05 polymerase genes with PB2 627E, 
MOI 0.01). 24h later cell supernatants were titrated for infectious virus 
by plaque assay on MDCK cells. d, 293T cells transduced with lentiviral 
vectors delivering transgenes expressing puromycin and shRNA targeting 
huANP32A, huANP32B, both huANP32A and B, or negative. Puromycin- 
selected cells were transfected with pHOM1-firefly minigenome reporter, 
human-adapted avian 50-92 polymerase (627K) and Renilla expression 
control. Luciferase activity measured after 20 h. Knockdown in 293T 

cells was verified by immunoblotting using antibody against vinculin, 
huANP32A and huANP32B. (a, b, d, Data are firefly activity normalized 
to Renilla, plotted as per cent of negative or ALLStars; n = 3 biological 
replicates; error as s.e.m.; one-way ANOVA comparisons to ALLStars or 
negative; NS= not significant, **P < (0.01, ***P< 0.001, ****P < 0.0001). 
e, Puromycin-selected A549 cells expressing shRNA against hoANP32 A 
and/or B were infected with human (H3N2) virus A/England/691/2010 
(MOI 0.1). After 24 h, cell supernatants were titrated by plaque assay on 
MDCK cells. (c, e, Data are % p.f.u. relative to ALLStars or negative; n =3 
biological replicates; error as s.e.m.; one-way ANOVA, comparisons to 
ALLStars or negative; ***P < 0.001, ****P < 0.0001; pattern of results 
consistent in at least three independent experiments.). 
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Figure 4 | Activity of avian influenza virus polymerase is dependent 

on a unique amino acid sequence present on avian ANP32A proteins. 
a, chANP32A protein schematic with 33 amino acid insertion relative to 
human homologue. b, huANP32A schematic representative of ANP32A 
from mammals, ostrich and of ANP32B. c, 293T cells transfected with 
Flag-tagged ANP32 and after 20 h transfected with pHOM1-firefly 
minigenome reporter, avian 50-92 polymerase (627E) and Renilla 
expression control. Luciferase activity measured 24 h later. (Data are PB2 
627E polymerase activity normalized to Renilla; n= 3 biological replicates; 
error plotted as s.e.m. of the ratio; one-way ANOVA, comparisons to 
empty vector; NS= not significant, ****P < 0.0001; pattern of results 
consistent in at least three independent experiments.) ty, turkey; dk, duck; 
zf, zebra finch; os, ostrich; mu, mouse; pg, pig. d, Immunoblot analysis of 
Flag-tagged ANP32A constructs using antibody against Flag peptide and 
vinculin. 


short hairpin RNA (Fig. 3a and Extended Data Fig. 6) and short inter- 
fering RNA (Fig. 3b) targeting chANP32A decreased activity of avian 
virus polymerase (PB2 627E). Knockdown of ANP32A in DF-1 cells 
also decreased the yield of infectious avian-like influenza virus (Fig. 3c). 

We next investigated whether the human homologues of ANP32A 
or B support activity of influenza polymerase in human cells. Human 
ANP32A and B were previously listed in the human influenza A virus 
interactome?!, and depletion of huANP32A and/or huANP32B from 
human cells decreased polymerase activity and replication of human 
influenza virus”””?, We used sh or siRNA to target hoANP32A and B in 
human cells. Human-adapted viral polymerase (50-92 PB2 627K) activ- 
ity was reduced when levels of either ANP32A or B were decreased, and 
further reduced in cells from which both proteins were depleted, sug- 
gesting a dependence on both family members (Fig. 3d and Extended 
Data Fig. 7). Yields of influenza virus A/England/691/2010, a seasonal 
H3N2 human virus, were decreased in cells depleted of ANP32A 
and/or ANP32B (Fig. 3e). We note with interest that both depletion 
and overexpression of human ANP32A or B were deleterious for poly- 
merase activity and virus yields in human cells (Fig. 2), which suggests 
that influenza virus relies on balanced expression of ANP32 proteins. 

Sequence alignment of avian and mammalian ANP32A revealed 
that avian (except ratite) ANP32A harbours an additional 33 amino 
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acids (176-208), derived partly by a repeat of amino acids 149-175, 
expressed from an additional exon absent from mammals and ostrich 
(Extended Data Fig. 8). Avian ANP32B does not contain the addi- 
tional sequence (Extended Data Fig. 8). We cloned and Flag-tagged 
ANP32A cDNA from chicken, turkey, duck, and zebra finch, all of 
which encode homologues with the additional sequence, and also from 
ostrich, human, mouse, and pig, all of which were mammalian-like in 
length (Fig. 4a, b, d). The four longer avian proteins increased avian, 
but not human-adapted, influenza virus polymerase activity in human 
cells (Fig. 4c and Extended Data Fig. 9). In contrast, the ostrich protein, 
or any of the mammalian homologues, did not rescue avian virus pol- 
ymerase activity. Interestingly, replication of avian influenza viruses in 
ostrich and other ratites selects for mammalian-adapted PB2 mutants”. 

Deletion of amino acids 176-208 from chANP32A abrogated its abil- 
ity to support avian virus polymerase. Conversely, insertion of the addi- 
tional avian-specific 33 amino acids bestowed upon either huANP32A 
or huANP32B the ability to support avian virus polymerase (Fig. 4c, d). 

In summaty, we conducted a novel screen of chicken genome radi- 
ation hybrid cells and identified a single chicken gene, ANP32A, that 
supports avian influenza virus polymerase activity in human cells. We 
suggest that avian influenza virus has co-evolved with its natural hosts, 
wild birds, to co-opt avian ANP32<A as a host factor that supports its 
polymerase activity in the nucleus. The avian influenza polymerase 
cannot efficiently utilize shorter ANP32 proteins, such as ANP32B 
or mammalian ANP32A homologues, for this activity. Acquisition 
of host-adapting mutations, such as PB2(E627K), enables polymer- 
ase activity to be supported by the shorter ANP32 proteins typical of 
mammalian hosts. The underlying mechanism of avian influenza pol- 
ymerase restriction in human cells has evaded researchers for decades. 
Others have previously demonstrated a role for PB2 residue 627 in its 
interaction with NP!”*, and viral RNA promoter sequences”°”’ that 
may influence recognition of incoming vRNPS by RIG-I*®. The nature 
of residue 627 has also been linked with a PB2 dependency on specific 
importin family members'*”’. How these observations are linked with 
the species difference in host factor ANP32A we describe here remains 
to be elucidated. 

Substitutions in PB2 are rapidly selected after avian influenza viruses 
enter mammalian hosts and PB2 adaptation is essential to support the 
pandemic potential of emerging avian influenza viruses. This strin- 
gent requirement for the virus to optimize replication efficiency in the 
human host suggests that disrupting the interaction of the virus with 
ANP32A may be a novel means for virus control. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Cells and cell culture. Chinese Hamster cell line (Wg3h), chicken genome RH 
clones (Laboratoire de génétique cellulaire, Institut National de la Recherche 
Agronomique, Castanet-Tolosan, France)!’, human embryonic kidney (293T) 
(ATCC), human lung adenocarcinoma epithelial cells (A549) (ATCC) and Madin- 
Darby canine kidney (MDCK) cells (ATCC) were maintained in cell culture 
media (Dulbecco’s modified Eagle’s medium (DMEM; Invitrogen) supplemented 
with 10% fetal calf serum (FCS) (Biosera) and with 1% penicillin-streptomycin 
(Invitrogen)). Chicken fibroblast (DF-1) (ATCC) cells were maintained in DF-1 
cell culture media (DMEM supplemented with 10% FCS, 5% tryptose phosphate 
broth (Sigma-Aldrich) and 0.1% penicillin-streptomycin (Invitrogen)). Cell lines 
were maintained at 37°C in a 5% CO atmosphere. Cell lines were authenticated 
by RT-PCR and verified negative for mycoplasma. 

Microarray analysis. Total RNA was extracted from 1 x 10° cells that were of the 
same passage as those tested for polymerase, using an RNeasy kit (Qiagen) accord- 
ing to the manufacturer's instructions. On-column DNA digestion was performed 
using RNase-free DNase (Qiagen) to remove contaminating genomic DNA. RNA 
samples were quantified using a Nanodrop Spectrophotometer (Thermo Scientific) 
and checked for quality using a 2100 Bioanalyzer (Agilent Technologies). All RNA 
samples had an RNA integrity number (RIN) > 9.8. Array hybridization was per- 
formed according to the manufacturer's instructions (Affymetrix). Labelled sam- 
ples were hybridized to the Affymetrix Chicken Gene 1.0 ST Arrays in a GeneChip 
Hybridization Oven for 16h at 45°C and 60rpm in an Affymetrix Hybridization 
Oven 645. After washing and staining, the arrays were scanned with the Affymetrix 
GeneChip Scanner 3000 7G. Gene-level expression signal estimates were derived 
from CEL files generated from raw data using the multi-array analysis (RMA) algo- 
rithm implemented from the Affymetrix GeneChip Command Console Software 
Version 3.0.1. Data Pre-Processing and filtering was done using Partek Software 
Version 6.6 and included: RMA background correction, quantile normalization 
across all chips in the experiment, log, transformation and median polish summa- 
rization. Statistically significant genes were discovered by comparisons between 
the positive RH clones, parent cells and negative RH clones by two-way ANOVA 
(variables: clone and passage number) adjusted with the Benjamini-Hochberg 
multiple-testing correction (false discovery rate (FDR) of P< 0.05). Statistically 
significant genes were identified and those with fold-change values <|+1.5| were 
removed. 

Cloning of candidate cDNAs. Sequence specific primers were used to amplify 
targeted cellular transcripts of chicken genes from chromosome 10, including: 
ANP32A, RAB1IA (Ras-related protein Rab-11A), TIPIN (TIMELESS Interacting 
Protein), RPL4 (Ribosomal Protein L4), PIAS1 (Protein Inhibitor Of Activated 
STAT, 1), TLE3 (Transducin-Like Enhancer Of Split 3), EIF3J (Eukaryotic 
Translation Initiation Factor 3, Subunit J) and CTDSPL2 (CTD (Carboxy-Terminal 
Domain, RNA Polymerase II, Polypeptide A) Small Phosphatase Like 2) from 
total RNA extracted from RH clone 476, using SuperScript III One-Step RT-PCR 
System (Invitrogen). Chicken ANP32B, DDX17 (DEAD (Asp-Glu-Ala-Asp) 
Box Helicase 17), IMPal, 3 and 7 (Importin «-1,3 and 7) cDNAs were ampli- 
fied from RNA extracted from DF-1 cells. PCR products were cloned into the 
pCAGGS expression vector. cDNAs of full length ANP32A isoforms of several 
species were generated by gene synthesis (GeneArt Strings DNA Fragments) and 
inserted into the pCAGGS expression vector, based on the following sequences: 
Chicken ANP32A (chANP32A) (Gallus gallus, XP_413932.3), human ANP32A 
(huANP32A) (Homo sapiens, NP_006296.1), zebra finch ANP32A (zfANP32A) 
(Taeniopygia guttata, XP_012424064.1), duck ANP32A (dkANP32A) (Anas platy- 
rhynchos, XP_005023024.1), turkey ANP32A (tyANP32A) (Meleagris gallopavo, 
XP_010715918.1), ostrich ANP32A (osANP32A) (Struthio camelus australis, 
XP_009665579.1), pig ANP32A (pgANP32A) (Sus scrofa, XP_003121807.3), 
mouse ANP32A (muANP32A) (Mus musculus, NP_033802.2), chicken ANP32B 
(chANP32B) (Gallus gallus, NP_001026105.1) and human ANP32B (huANP32B) 
(Homo sapiens, NP_006392.1). The sequence of dkANP32A was amended to 
contain the sequence encoding the intact N terminus by comparison with duck 
RNA-seq data (from ENA_ERP005909) that had been de novo assembled (using 
CLC Genomic Workbench 7.5.1), where the appropriate contig was identi- 
fied by BLASTX against the chicken ANP32A protein sequence. Furthermore, 
the duck sequence was confirmed by reverse-transcription of ANP32A mRNA 
derived from duck embryonic fibroblast cells and DNA sequencing. Mutants of 
these sequences were also generated by gene synthesis, as above, and included: 
chANP32A ,33 (chANP32A with the ‘avian insertion deleted (aal76-—208)), 
huANP32A+33 and huANP32B+33 (huANP32A and B with the avian sequence 
-VLSLVKDRDDKEAPDSDAEGY VEGLDDEEEDED- inserted after aal75 
(huANP32A) and aal73 (huANP32B)). All plasmid constructs were verified by 
DNA sequencing. Primers and sequence information are available upon request. 

Generation of recombinant and clinical Influenza A virus. Reverse genetics sys- 
tems for the following virus strains were used in this study: PR8 (H1N1 A/Puerto 
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Rico/8/1934)*°, UDL (H9N2 A/chicken/UDL-01/08)*! (developed in collaboration 
with M. Iqbal, Pirbright Institute, UK) and Ty05 (H5N1 A/Turkey/Turkey/5/2005) 
(a kind gift from R. Fouchier, Erasmus University, Netherlands). The PB2(E627K) 
substitution was made to 50-92 and Ty05 (K627E) by overlapping PCR of the PB2 
plasmid as previously described!!. Ty05:PR8 6:2 recombinant virus was generated 
with the HA and NA derived from PR8 and the internal genes of Ty05; virus rescue 
was performed by co-transfection of the 12-plasmid system: 8 poll plasmids as 
described above and 4 helper expression plasmids encoding A/Victoria/3/75 (VIC) 
polymerase components and NP expressed by the pCAGGS vector, as previously 
described***3. UDL virus rescue was performed by co-transfection of a 9 plasmid 
system including: 7 bidirectional pHW2000 plasmids*4 encoding PB2, PA, HA, NA, 
NP, NS and M genes, together with a poll plasmid encoding PB1 and a pCAGGS 
expression plasmid of the UDL-PB1 gene. Ty05 virus stocks were propagated in 
9-day-old embryonated chicken eggs incubated at 37°C. UDL virus stocks were 
generated in MDCK cells with infection media (serum free DMEM supplemented 
with 1% penicillin-streptomycin and 1 jg ml-! TPCK-treated trypsin (Lorne 
Labs)) and incubated at 37°C. Clinical isolate A/England/691/2010 (H3N2) (Public 
Health England) was propagated in MDCK cells with infection media. Aliquots of 
infectious virus were stored at —80°C. Infectious titres were determined by plaque 
assay on MDCK cells. 

Influenza A virus infection. 293T cells were transfected with ANP32A (500 ng, 
24-well) using Lipofectamine 3000 (Invitrogen) and after 20h infected with virus 
diluted in serum free DMEM for 1h at 33 or 37°C (MOlas indicated in the relevant 
figure legends) and replaced with cell culture medium (for RT-PCR analysis) or 
DMEM supplemented with 0.1% FCS and 1g ml! TPCK trypsin (Worthington- 
Biochemical) (for infectious virus titres). shRNA A549 cells were infected with 
virus as for 293T cells, except infection media lacked FCS. Infected cell lysates and 
cell supernatants were harvested at 12 and/or 24h post-infection. Infectious titres 
were determined by plaque assay on MDCK cells. 

Safety/biosecurity. All work with infectious agents was conducted in biosafety 
level 2 facilities, approved by the Health and Safety Executive of the UK and in 
accordance with local rules, at Imperial College London, UK. 

Polymerase assay. Influenza polymerase activity was measured by use of a minige- 
nome reporter which contains the firefly luciferase gene flanked by the non-coding 
regions of the influenza NS gene segment, transcribed from a species-specific 
poll plasmid with a mouse terminator sequence*’. The human and chicken poll 
minigenomes (pHOM1-Firefly and pCOM1-Firefly) are described previously**; 
pMouse-Poll-Firefly was generated by substituting in the mouse poll promoter 
sequence*”. pCAGGS expression plasmids encoding each polymerase component 
and NP for 50-92 (H5N1 A/Turkey/England/50-92/91), Ty05, VIC and BAV 
(A/Duck/Bavaria/1/77) are described previously!36°8, UDL PB1, PB2, PA and 
NP genes were sub-cloned into the pCAGGS plasmid. Mutagenesis of PB2 genes 
to encode PB2 627K or 627E was performed by overlapping PCR as described 
previously''*°, All plasmid constructs were verified by DNA sequencing. Primers 
and sequence information are available upon request. To measure influenza 
polymerase activity, 293T cells were transfected in 48-well plates with pCAGGS 
plasmids encoding the PB1 (20 ng), PB2 (20 ng), PA (10ng) and NP (40 ng) pro- 
teins, together with 20 ng species-specific minigenome reporter and 10 ng Renilla 
luciferase expression plasmid (pCAGGS-Renilla)® as an internal control, using 
Lipofectamine 3000 transfection reagent (Invitrogen) according to manufacturers’ 
instructions. Wg3h, RH clones and DF-1 cells were transfected as 293T cells but 
with twice the concentration of DNA and using Lipofectamine 2000 (Invitrogen). 
Cells were incubated at 37°C. 20h after transfection, cells were lysed with 5011 of 
passive lysis buffer (Promega), and firefly and Renilla luciferase bioluminescence 
was measured using a Dual-luciferase system (Promega) with a FLUOstar Omega 
plate reader (BMG Labtech). The effect of cellular factors on influenza polymerase 
was examined by polymerase assay after expression of constructs (250 ng) for 24h. 
shRNA mediated silencing. Silencing was achieved by lentivirus delivery 
of shRNA encoding transgenes. Lentiviral vectors were generated using the 
TRC1.5-pLKO.1-puro plasmid (MISSION Sigma-Aldrich) containing the shRNA 
sequence and puromycin selection gene. shRNA sequences for target genes were as 
follows: huANP32A (TRCN0000006905, 5’-CCGGCCTGAAGATGAGGGAGA 
AGATCTCGAGATCTTCTCCCTCATCTTCAGGTTTTT-3’ (target sequence, 
5'-CCTGAAGATGAGGGAGAAGAT-3’)), huANP32B (TRCN0000077928, 
5’-CCGGCCACCCAAAGAGCCAAAGAATCTCGAGATTCTTTGGCTCTTTG 
GGTGGTTTTTG-3’ (target sequence, 5’-CCACCCAAAGAGCCAAAGAAT-3’)), 
chANP32A (TRCN0000006902, 5’-CCGGCCTATTGTGATTTGACTGTT 
TCTCGAGAAACAGTCAAATCACAATAGGTTTTT-3’ (target sequence, 
5'-CCTATTGTGATTTGACTGTTT-3’)) and Negative (SHC002, Non- 
Mammalian shRNA Control) (MISSION Sigma-Aldrich). Lentiviruses were gener- 
ated by co-transfection in 293T cells with pCMV-delta8.2”, pCAGG-VSV G1! and 
TRC1.5-pLKO.1-puro at a ratio of 1:0.25:1 using Lipofectamine 3000 (Invitrogen), 
cell culture media was replaced after 16hrs at a reduced volume and supernatant 
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harvested at 36h post-transfection before being filtered (0.45 1m) and aliquots 
frozen at —80°C. 293T, A549 or DF-1 cells were transduced with lentiviral vectors 
for 16h before media was replaced. After 72h incubation, cells were split and cell 
culture media was replaced containing 0.5,1g ml! puromycin (Invivogen). Cells 
were incubated a further 72h after selection before analysis. 

siRNA mediated silencing. DF-1, 293T or RH clone 476 cells were transfected with 
100 nM of siRNA using HiPerFect transfection reagent in 48-well plates, according 
to manufacturer's instructions (Qiagen). 48 h later cells were transfected with poly- 
merase and minigenome constructs and harvested after a further 20h, for luciferase 
quantification and knockdown analysis. Total RNA was extracted as described 
previously but with 100 11 of cell lysate added to AVL buffer before continuing with 
the RNeasy mini kit (Qiagen). siRNAs for target genes were as follows:, AllStars 
Negative Control, huaANP32A (SI02655212 FlexiTube), huANP32B (SI02655380 
FlexiTube) (Qiagen), 50-92 NP (5’-AAGGAUCUUAUUUCUUCGGAG-3’), 
chANP32A (5’-GAGCTGGAATTCTTGAGTACA-3’) (custom RNA oligos, 
Sigma-Aldrich). 

Quantification of chANP32A and B mRNA levels. Total RNA from RH clones 
and DF-1 cells were extracted using an RNeasy mini kit (Qiagen), following man- 
ufacturer’s instructions. During extraction of RNA, RNeasy columns were treated 
with RNase-Free DNase (Qiagen). RNA samples were quantified using a Nanodrop 
Spectrophotometer (Thermo Scientific). Equal concentrations of RNA were 
subject to first strand synthesis using QuantiTect Reverse Transcription Kit (Qiagen) 
with primers specific for chANP32A (5'-CAACTGTAGGTCATACGAAGGC-3’) 
and chANP32B (5’-GGTGGCCTTGAAGTTCTAGC-3’). This product was then 
quantified with Mesa Green quantitative PCR (qPCR) MasterMix Plus for SYBR 
Assay I dTTP (Eurogentec) using the primers for first strand synthesis together 
with chANP32A (5’-GTTTGCAACTGAGGCTAAGC-3’) and chANP32B 
(5’‘-ATGAGCATCGTCACCTCGC-3’). Real-time quantitative PCR analysis was 
performed (Applied Biosystems ViiA 7 Real-Time PCR System) and absolute copy 
numbers of either chANP32A or B calculated using a standard curve of known 
concentrations of the corresponding cDNA expression plasmid. Primers were 
designed to be specific to their target transcripts by using BLASTX against both 
the hamster and chicken genomes. 

Quantification of RNA generated by influenza polymerase. Purified total 
RNA (1000 ng) was subject to first-strand cDNA synthesis with gene spe- 
cific primers, oligo(dT)20 or random hexamers (to amplify mRNA) using 
SuperScript III (Invitrogen) followed by RNase H treatment (Invitrogen). 
Primer design was based on Obayashi et al. (2008)** for quantification of 
RNA species of the luciferase minigenome driven by reconstituted polymer- 
ase, and UDL PB2 RNA species were quantified using a tagged-primer system 
adapted from Kawakami et al. (2011)*. First strand primers included: luciferase 
vRNA (5’- TATGAACATTTCGCAGCCTACCGTAGTGTT-3’), luciferase CRNA 
(5'‘-AGTAGAAACAAGGGTG-3’), luciferase mRNA (Oligo(dT)20), UDL PB2 
vRNA (5'-GGCCGTCATGGTGGCGAAT (tag)GATGCGTGATGTATT 
GGGAAC-3’), UDL PB2 cRNA (5’-GCTAGCTTCAGCTAGGCATC (tag) AGTAGAA 
ACAAGGTCGTT-3’), UDL PB2 mRNA (Oligo(dT)20) and 18S ribosomal RNA 
(Random Hexamers (Invitrogen)). After first strand synthesis, 1 ul of CDNA 
was subject to real-time quantitative PCR analysis with a gene specific primer 
pair using SYBR green PCR mix (Applied Biosystems) and analysed on the 
Applied Biosystems ViiA 7 Real-Time PCR System. Gene specific primer pairs 
were as follows: Luciferase gene (5/-CCGGAATGATTTGATTGCCA-3’ and 
5’-TATGAACATTTCGCAGCCTACCGTAGTGTT-3’), UDL PB2 vRNA 
(5'-GGCCGTCATGGTGGCGAAT (tag)-3’ and 5’-CCTCTCAACACTGCA 
GATTCC-3’), UDL PB2 cRNA (5’-GCTAGCTTCAGCTAGGCATC (tag)-3’ and 
5'-GGAATCTGCAGTGTTGAGAGG-3’), UDL PB2 mRNA (5/-GATGCGT 
GATGTATTGGGAAC-3! and 5’-CCTCTCAACACTGCAGATTCC-3’) and 
18S ribosomal RNA (5’-GCAAATTACCCACTCCCG-3’ and 5‘-CTGCAGCAA 
CTTTAATATACGC-3’). Fold change RNA to PB2 627E with Empty vector was 
calculated by AAC, including normalization to Cy values of 18S ribosomal RNA. 
Immunoblot analysis. Cells were lysed in Passive Lysis buffer (Promega) or 
NP40 lysis buffer (for cellular fractionation) and prepared in Laemmli 2x 
buffer (Sigma-Aldrich). Cell proteins were resolved by SDS-PAGE using Mini- 
PROTEAN TGx Precast Gels (Bio-Rad). Immunoblotting was carried out using 
the following primary antibodies: anti-chANP32A rabbit polyclonal (LS-B10851, 
LifeSpan BioSciences, Inc.), anti-huANP32A rabbit polyclonal (AB51013, Abcam), 


anti-huANP32B rabbit monoclonal (AB184565, Abcam), a-vinculin rabbit 
monoclonal (AB129002, Abcam), anti-Flag M2 mouse monoclonal (F1804 or 
F3165, Sigma-Aldrich), anti-PB2 rabbit polyclonal (2N580, a kind gift from 
Paul Digard, Roslin Institute), and followed with secondary horseradish perox- 
idase-conjugated (HRP) antibodies: anti-mouse IgG (H/L):HRP goat polyclonal 
(STARI17P, AbD Serotec) and anti-rabbit IgG:HRP sheep polyclonal (STAR54, 
AbD Serotec). For quantification of cellular fractions, the following secondary 
antibodies were used: anti-rabbit IgG (H/L):DyLight 800 (5151P, Cell signalling) 
and anti-mouse IgG (H/L):DyLight 680 (5470P, Cell signalling). Protein bands 
were visualized by chemiluminescence (ECL+ western blotting substrate, Pierce) 
using a FUSION-FX imaging system (Vilber Lourmat). 
Cellular Fractionation. 293T cells were transfected with empty vector or ANP32 
plasmid together with the polymerase complex and NP of 50-92 (PB2 627E) and 
pHOM1-firefly minigenome reporter. After 24h, cells monolayers were washed 
in ice-cold PBS and lysed in 0.1% NP40 buffer (50 mM Tris pH 7.5, 150 mM NaCl, 
0.1% NP40 and protease inhibitors (EDTA-free COMPLETE tablet (Roche)). 
Total lysates were centrifuged at 228 g for 5 min at 4°C. Supernatant was removed 
(Cytoplasmic fraction) and the nuclear pellet was resuspended in 1% NP4O0 (as 
above) and subject to syringing with a 25G needle. Fractions were analysed by 
immunoblotting. 
Bioinformatic and statistical analysis. Analysis of Microarray data was performed 
as previously mentioned, using Affymetrix GeneChip Command Console Software 
Version 3.0.1. and Partek Software Version 6.6. Statistical analysis of biological 
replicates was performed by One-way ANOVA with Dunnett’s multiple com- 
parison analysis or Two-way ANOVA with Sidak multiple comparison analysis, 
using GraphPad Prism 6. Sequence alignments were performed using Geneious 
R6 software. Quantification of immunoblots was performed using Image Studio 
Lite V5.2. 

No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and the investigators were not blinded to allocation during 
experiments and outcome assessment. 
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Extended Data Figure 1 | Analysis of mRNAs by PCA mapping reveals of negative clones (red), parent Wg3H cells (blue) and positive clones: 


diversity of the radiation hybrid clones and their genetic instability 377 (purple), 386 (orange) and 365 (green) and 476 (cyan) arrays are 
during cell passage. Each sphere represents a microarray sample. The distinguished by colour, and passage numbers 1 and 12 are distinguished 
percentage values in the axes parentheses designate proportion of overall by the size of spheres. Negative arrays are dispersed, while parent cells 
variance as described by each PC. PC1 principal component 1 (x-axis); are accumulated further to the right of PC1 and upwards of PC2. Positive 
PC2 principal component 2 (y-axis); PC3 principal component 3 (z-axis). clones show distinct variability in their location while passaging reduced 
PC1 describes the predominant amount of variance (15.6%). Selection their separation from parent cells. This analysis accompanies Fig. 1. 
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Extended Data Figure 2 | Confirmation of chANP32A and chANP32B numbers of chANP32A mRNA were calculated by RT-PCR against a 


expression in RH clones by qRT-PCR. RNA was extracted from the standard curve generated with chANP32A cDNA using primers specific 
RH clones after testing for influenza polymerase activity and analysed for chANP32A. b, Copy number of ANP32B mRNA were measured by 

by microarray for chicken transcripts. The same RNA was used to qRT-PCR against a standard curve generated with chANP32B cDNA using 
validate identification of ANP32A by confirming the level of expression primers specific for chANP32B. (n= 3 technical replicates; error as s.e.m.). 
of ANP32A (and ANP32B as control) in the parent Wg3h cells, positive This analysis accompanies Fig. 1. 


clones, passaged positive clones and a selection of negative clones. a, Copy 
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Extended Data Figure 3 | Knockdown of chANP32A in positive RH 
clone 476 diminished the ability to support avian influenza polymerase 
activity. a, Positive RH clone 476 cells were transfected with 100 nM of 
siRNA targeting NP, chANP32A or no target (Allstars). After 48 h cells 
were transfected with mouse-poll-firefly minigenome reporter, avian 
influenza polymerase (H5N1 50-92) with either PB2 627E or 627K, Renilla 
control and either empty plasmid or codon optimised chANP32A (codon 


Allstars NP chANP32A 


siRNA 


optimization according to algorithm by GeneArt with manual editing). 
(Data are luciferase activity measured after a further 24h; n =3 biological 
replicates; errors are displayed as s.e.m.). b, Knockdown of chANP32A 
was confirmed by qRT-PCR of RNA extracted from siRNA treated cells, 
calculated using a standard curve generated with chANP32A cDNA, using 
primers specific for chANP32A (n =3 biological replicates; errors are 
displayed as s.e.m.). This analysis accompanies Fig. 1. 
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Extended Data Figure 4 | Expression of chANP32A in human cells 
permits influenza polymerase activity of several avian influenza 
polymerases and an avianized human influenza polymerase and 
increases avian virus replication. 293T cells were transfected with empty 
vector, ChANP32A or huANP32A. a, b, 20h later, cells were transfected 
with pHOM1-firefly minigenome reporter, and the polymerase set from 
low pathogenicity avH1N1 (Bav) or H9N2 (UDL), highly pathogenic 
H5N1 (50-92), H5N1 (Ty05), or huH3N2 (Vic) viruses, with either PB2 
627E (a) or 627K (b) and Renilla expression control. After a further 

24 h luciferase activity was measured. (Data are mean PB2 627E or K 
polymerase activity normalized to Renilla; n= 3 biological replicates; error 


plotted as s.e.m. of the ratio; pattern of results consistent in at least three 
independent experiments). This analysis accompanies Fig. 1. c, 20h after 
transfection with ch or huANP32A or empty vector, cells were infected 
with avian-like influenza virus (H5N1Ty05:PR8 6:2 recombinant virus) 
(MOI 0.1) bearing PB2 627E (black bars) or PB2 627K (grey bars). Infected 
cells were incubated at 37 °C and cell supernatant titrated for infectious 
virus at 24h post infection on MDCK cells by plaque assay.(Data displayed 
as logio plaque forming units per ml; n =3 biological replicates; error 
plotted as s.e.m.; one-way ANOVA, comparisons to empty vector, NS= not 
significant, *P < 0.05 ***P < 0.001; pattern of results consistent in at least 
three independent experiments). This analysis accompanies Fig. 2. 
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Extended Data Figure 5 | chANP32A does not alter expression or were analysed by immunoblotting. b, Total lysates were immunoblotted 
nuclear accumulation of avian PB2 protein in human cells. 293T cells for vinculin, PB2 and Flag peptide. c, Immunoblots were quantified 
were transfected with pHOM1-firefly minigenome, avian influenza using Image Studio Lite V5.2. The ratio of nuclear to cytoplasmic PB2 
polymerase and NP of H5N1 50-92 (PB2 627E) together with empty was calculated by dividing the ratio of PB2 to vinculin by the ratio of 
vector, chANP32A or chANP32AA33 or cells were untransfected (Mock). PB2 to lamin B from the cytoplasmic and nuclear fractions, respectively. 
Cell monolayers were harvested after 24h and lysed in 0.1% NP40 lysate Data are the mean ratios from three independent experiments (excepting 
buffer and total fractions taken before centrifugation to generate a nuclear = chANP32A33 for which only 2 data points were available), error bars are 
pellet and cytoplasmic fraction. Nuclear pellets were resuspended in 1% s.e.m. Data are not statistically significantly different by one-way ANOVA. 
NP40 buffer. a, Protein levels of vinculin (cytoplasmic marker) and lamin This analysis accompanies Fig. 2. 
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Extended Data Figure 6 | Quantification of knockdown of chANP32A 
in chicken cells. DF-1 cells were transduced with VSV-G lentiviral vectors 
that delivered a transgene expressing shRNA directed against chANP32A 
or a negative sequence and the puromycin gene. Puromycin selected cells 
were transfected with siRNA (100nM) (underlined). RNA was extracted 
from untreated shRNA cells and siRNA-treated shRNA cells. Knockdown 
of chANP32A was quantified by qRT-PCR of the extracted RNA, 
calculated using a standard curve generated with chANP32A cDNA, using 
primers specific for chANP32A. Fold decrease of RNA copies is displayed 
compared to negative shRNA DF-1 or ALLstars treated chANP32A shRNA 
DF-1 cells. (n = 3 biological replicates; error displayed as s.e.m.). This 
analysis accompanies experiments in Fig. 3a-c. 
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Extended Data Figure 7 | siRNA knockdown demonstrates that human- 
adapted influenza polymerase activity is dependent on huANP32A and 
huANP32B in human cells. a, 293T cells were transfected with siRNA 
(100 nM) against NP, ho ANP32A, huANP32B or both huANP32A and 
huANP32B (50 nM each). After 48h, cells were transfected with pHOM1- 
firefly minigenome, human-adapted avian influenza polymerase (H5N1 
50-92 PB2 627K), and Renilla expression control. Luciferase activity 
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was measured after a further 24h. (Data are firefly activity normalized 
to Renilla, plotted as % of Allstars; n = 3 biological replicates; error as 
s.e.m.; one-way ANOVA comparisons to Allstars, ****P <0.0001;). 

b, Knockdown of gene targets was verified by immunoblotting using 
antibody against vinculin, huANP32A and huANP32B. This analysis 
accompanies Fig. 3e, f. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


i 


UaanAcaah 


fs es ese 
pafakababebafab al 
tea es fs] ales es] (~ el 
alejaelelejelele| 
fa) ses es 


AamAahrAnAA 


ajaieyeleyele} 
qeaereeyees 
>>> FFF rei 


x] 
BeqoRRmAmAaa 


207 
ED 
aoe P 


a 


47 


256 
bd 


— ED) 


lL ee | eee ee 


7 


G 


K oe 


i 


i} a 
ae] iA 
>| 
ee y, a 
not eee @@e@nn 
> bei cb tec Once eieoge 
ale ho ete N= |4|414 (41414 I) 
u) ~ a Oe ae Pa | Pan! bn] baal Pal Pol Pa] been) | 1] 
prrrri | Sg eae 
<<] <<] <n (ea) | (3) (ea) ea) (ea! (ea! (ea (ea) 
Weta oe ex] af] affable] 
rrrrnd anole elejel|ogia 
RO Se ee Oe (ea | (fa) (ea ea fe (ea) (ce) 
A Poke bot (ea) | (fa) (ea iba fe (ea) (ee) Le Le 
=e ecb Sob af (ea | (ea) (ea 3) ea) ea] eae) |e 
se ie a (ea) | (ea) (ea ea) et ea ee) () 0) 
Wa\Wanane | | | i ill siggegegeg>> la 
Ic [Cae ae a colicalca|calca|caica| cafes fa 
c eee >>|>|>|>|>|>|>[aral (| 
fad |fadfegien | | | 1 od A clea ete} ja 
@@@m ii iit (ea | (eo ea a ie (ea ea (ea ea) ja 
De ecb Se ae x 2fafabeateabafafe| =] 
> >|> (ea) | (fa) (ea iba (ea) (ea) (ea) (ea) ea) xy 
pirrrred © eoegogoud fa 
|: (ea | ea) iea ea feat ea) ea eee) =] 
Fort tt 0 | Mea) avteay ey ea) eal eal iea) 5 > | la) 
(ea | (eal (ea es) (ea ea ete) >) fa) 
ey geeeedece 
re] 
xy eo gece IO 
| 
1 Valdlala , 


Aax<>aaanac 


SA NNAMNNNOS 


aajeieeiqcsy 
qqayayaqyajaja 


5 


1 Ue ie ie | 
Hyerr@iriig ofcleleleleiciogm AAR aOMOmO 
I (ea)| esl ea) eal eal eal oslo! esl) SR OORMOO) alaqgandadmn 
afeleieialieieiegge afeabeabeabeaafpaa 
a BF eeeeseceoe aR olclegiclala| 
a (ea) | (ea ea] ea ea] ea ea] ea] > moss 
(eel ease Sach actscheag ed B\E<SeBho008Hn (ea) | (ea (eal ea) [ea] [eal [eal 
1 ed S111] onayaieyeye} lok (ea) | {ea} [ea] fea) (ea (ea! fea) (ea) [eal ea 
fac | far fa 0 (fla (ae faa fn (a nayeleaiq Ae] onaieyjejejelejeiala 
>. s se 5-S SSS SS | SE) fs alesies tess eae) | | 
©-[ea)| ea) eal) +4 [ea ea eae i @gugead i @ ko} Ir rrrr rl 
H)aa>ossansa (es) | (es (es (es es fea ea] esl) () Sie >> >> Ebi 
mimeo med| eden donn fo | fa Da Da Doe oe g liciolololololo aia 
AAA Renae | &-S | SS SSHHeHpe Real | al (eal Can an) baal nl Pa | <q] ROBBED... 
| gigeeqgae.:| Basses eas 
M <)/S RA CeRR GH ra fal favayey ss] sists ley ss] 
Mm 2 SS <) ORayaia|q 
8 OPOOIOOojojokgge fa) sla ra | bal ba be ba 
a HS [> /> [>>> >] fea] | (ea ea] ea (es (es eae] (>) AanAAnMHAAY 
| za 2|2)2)2)2}2]}2]2]2] ES OBOAIA|A|)0 19 [Oo] OF Ieee fea Ez 11 t 1 oM@oanm 
< < < < < < 
< Scccc Ba § Boece Ba < Becee Ba S| Becee Ba § Seace Ba < Becca 2 
< a < a aig a < a <i a < a 
QM eSSSN5 AN) A aSSANSS OA) ALSSSNN BH AGSSSSS BH) Bl LSSASN BN) SB LSSASA AT 
Qiatorpracse, BAcMPRRche) BWacoPprr«te,) GacwPrr«te| Aiacepeep«te AacoPprr«ste 
ZecesSsazS 2gecasSsazS| ZerasSSazs| ZorasSSazS) ZiorasSSaz5) Z2iorasSsazs 
qO6Z2222942 <4 CS2Z22254¢2| CCGZSZEE942| LO6ZseZ2HC2) <CEZZZSZH4%Z) C/G EZZZSEHICE 
c|/Zer<¢ Qe <|\Zet< Qe c|\Zec<¢ Oe <\Zecc Qe c|\Zecc Oe c|/Zere¢ Qe 
SSS >SESSSS) SLHSESBSSSE| SSRSSSBSSS BIHSSESBSSE BIRZSSSBSLE BIHFSEBZLE 
S\SSLEESSOE SSSLEESSOE! SSSLEESSSE! SSSLEESSSE! S/SSLZEESSSE! S/SSLEESSSE 
SlSNSSZEaGe SSKScezeas2| SSKSsZeasZ| SSNSSZEaGS Gok sseease| SsRsszZeaseZ 


colours represent similarity of amino acid identity (black, 100%; dark grey, 


80-100%; light grey, 60-80%; white, <60%). Gaps are annotated by dashes. 


Residue numbers correspond to chANP32A. The 33 amino acid sequence 
found in avian species is situated between residues 176-208. This analysis 


Geneious R6 software. chANP32A is set as the reference sequence, and 
accompanies Fig. 4. 
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Extended Data Figure 8 | Alignment of ANP32A proteins reveals 
with sequences of ANP32B for chicken and human were aligned using 


LETTER 


o>) 
J 


firefly polymerase activity 
normalised to Renilla 
oO Nh b 
! | 
empty 


chANP32A -—a 


huANP32A 


zfANP32A 


tyANP32A 
osANP32A 
muANP32A 


dkANP32A -— 
pgANP32A 


chANP32ApD33 
huANP32A+33 


chANP32B a 
huANP32B IH 
huANP32B+33 —_ 


Extended Data Figure 9 | Expression of ANP32A and B proteins 
reduced human-adapted influenza polymerase activity in human cells. 
293T cells were transfected with Flag-tagged ANP32 constructs and after 
20h transfected with pHOM1-firefly minigenome reporter, human- 
adapted influenza polymerase (H5N1 50-92 with PB2 627K, together with 
Renilla expression control. Cells were assayed for luciferase activity 24h 
later. (Data are PB2 627K polymerase activity normalized to Renilla; n=3 
biological replicates; error plotted as s.e.m. of the ratio; one-way ANOVA, 
all constructs were significantly reduced compared to empty vector 

(P < 0.0001); pattern of results consistent in at least three independent 
experiments). These data relate to Fig. 4. 
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Extended Data Table 1 | Genes common between the avian 
influenza polymerase positive radiation hybrid cells 


Venn 1 


Venn 2 


Gene Name 
IQSEC3 
CCDC77 
WNK1 
KDM5A 
MAP2K1 
ZWILCH 
PIAS1 
FEM1B 
GLCE 
KIF23 
SPG11 
TIPIN 

RPL4 

CLN6 
ANP32A 
MORF4L1 
EIF3J 
CASC4 
BLM 

NUB1 
Unannotated 
RBM33 
WDR48 
PAXIP1 
Unannotated 
HPRT1 
DDX26B 
SLC9A6 
HTATSF1 
LOC422249 
FAM122B 
MOSPD1 
MMGT‘1 
MIR1726 
LOC430470 
MAP2K1 
PIAS1 
FEM1B 
GLCE 
KIF23 
SPG11 
TIPIN 

CLN6 
ANP32A 
EIF3J 
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BLM 


Chromosome 
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Microarray analysis of RNA extracted from RH clones revealed the expression of chicken genes. 
The gene list from Venn 1 shows common genes between the P1 positive hybrid clones when 
compared to parent Wg3h hamster cells. The gene list from Venn 2 shows common genes 
between P12 positive hybrid clones when compared to parent and 365P1 (positive RH clone) vs 
365P12 (reverted RH clone). This analysis accompanies the microarray analysis Venn diagrams 


in Fig. lb andc. 
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A LAIR1 insertion generates broadly reactive 
antibodies against malaria variant antigens 


Joshua Tan!***, Kathrin Pieper’, Luca Piccoli!*, Abdirahman Abdi?, Mathilde Foglierini!, Roger Geiger, 
Claire Maria Tully’, David Jarrossay!, Francis Maina Ndungu?, Juliana Wambua’, Philip Bejon**, Chiara Silacci Fregni!, 
Blanca Fernandez-Rodriguez', Sonia Barbieri!, Siro Bianchi®, Kevin Marsh**, Vandana Thathy’, Davide Corti®, 


Federica Sallusto!, Peter Bull?*§ & Antonio Lanzavecchia!“s 


Plasmodium falciparum antigens expressed on the surface of infected 
erythrocytes are important targets of naturally acquired immunity 
against malaria, but their high number and variability provide the 
pathogen with a powerful means of escape from host antibodies’ *. 
Although broadly reactive antibodies against these antigens could be 
useful as therapeutics and in vaccine design, their identification has 
proven elusive. Here we report the isolation of human monoclonal 
antibodies that recognize erythrocytes infected by different 
P. falciparum isolates and opsonize these cells by binding to members 
of the RIFIN family. These antibodies acquired broad reactivity 
through a novel mechanism of insertion of a large DNA fragment 
between the V and DJ segments. The insert, which is both necessary 
and sufficient for binding to RIFINs, encodes the entire 98 amino 
acid collagen-binding domain of LAIR1, an immunoglobulin 
superfamily inhibitory receptor encoded on chromosome 19. In each 
of the two donors studied, the antibodies are produced by a single 
expanded B-cell clone and carry distinct somatic mutations in the 
LAIR1 domain that abolish binding to collagen and increase binding 
to infected erythrocytes. These findings illustrate, with a biologically 
relevant example, a novel mechanism of antibody diversification 
by interchromosomal DNA transposition and demonstrate the 
existence of conserved epitopes that may be suitable candidates for 
the development of a malaria vaccine. 

To identify individuals who may produce antibodies that broadly 
react with P falciparum-infected erythrocytes (IEs), we developed 
an improved mixed agglutination assay (Fig. la). Plasma from adults 
(n=557) living in a malaria-~endemic region in Kilifi, Kenya, were ini- 
tially tested in pools of five (Fig. 1b) and then individually for their 
capacity to agglutinate mixtures of erythrocytes infected with three 
culture-adapted Kenyan parasite isolates, each stained with a different 
DNA dye. Most plasma samples formed single-colour agglutinates, 
but three were able to form mixed-colour agglutinates with at least six 
isolates (Fig. 1c). 

From two selected donors (C and D) whose plasma formed mixed 
agglutinates with eight parasite isolates, we immortalized immunoglob- 
ulin G (IgG)+ memory B cells® and screened the culture supernatants 
for the capacity to stain erythrocytes infected with the eight isolates. 
Surprisingly, most antibodies isolated from these donors stained mul- 
tiple isolates, with the best antibodies, such as MGC34, MGD21 and 
MGD339, recognizing all eight isolates tested (Fig. 1d). Conversely, a 
few antibodies, such as MGD13, were specific for a single isolate. In 
all cases, only a fraction of IEs were stained (Fig. le) and this fraction 
varied with different antibodies, possibly reflecting different clonal 
expression of the relevant antigen. Overall, these findings show that 
broadly reactive antibodies against IEs can be generated in response 
to malaria infection. 


We investigated the molecular basis of the broad antibody reactivity 
by comparing the sequences of the antibodies isolated from the two 
donors. While the antibodies with narrow reactivity showed classical 
VDJ organization of the heavy (H) chain gene, all the broadly reac- 
tive antibodies (14 from donor C, 13 from donor D) carried a large 
insert of more than 100 amino acids between their V and DJ segments 
(Fig. 2a and Extended Data Figs 1-3). In both donors, the core of the 
inserts encoded an amino acid sequence that was 85-96% identical 
to the extracellular domain of LAIR1, a collagen-binding inhibitory 
receptor encoded in the leukocyte receptor locus on chromosome 19 
(ref. 6). However, in each donor, the broadly reactive antibodies used 
a distinct VH/JH combination (VH3-7/JH6 in donor C and VH4-4/ 
JH6 in donor D) and had junctions of distinct length between the V, 
LAIRI and J segments. In addition, the broadly reactive antibodies 
from donor D shared a single light (L) chain (VK1-8/JK5), while the 
antibodies from donor C had one of three different L chains (VK1-5/ 
JK2, VK4-1/JK2, VL7-43/JL3) (Extended Data Fig. 4). All the broadly 
reactive antibodies carried a high load of somatic mutations spanning 
the whole V-LAIR1-DJ region. The mutations in the VH segment were 
used to reconstruct genealogy trees showing a developmental pathway 
with progressive acquisition of somatic mutations (Fig. 2b, c). Notably, 
the trees were consistent with those generated using only the LAIR1 
insert or the VL sequence (Extended Data Fig. 5). These findings indi- 
cate that, within each individual, a single B-cell clone carrying a LAIR1 
insert expanded after stimulation by malaria antigens and progressively 
accrued mutations in the LAIR1, VH and VL regions. 

To explore the mechanism that led to the generation of the LAIR1- 
containing antibodies, we compared complementary DNA and 
genomic DNA sequences obtained from the antibody-producing B-cell 
clones (Fig. 2d). In both donors, the genomic DNA contained a LAIR1 
insert that was larger than that found in the corresponding cDNA. In 
particular, in donor C, the insert comprised not only the 294 base pair 
(bp) exon encoding the extracellular LAIR1 domain, but also a 190 bp 5’ 
intronic region of the LAIRI gene that was partially spliced out in the 
messenger RNA, and a shorter 23 bp 3’ intronic region that was main- 
tained in the mRNA (Extended Data Fig. 6a). Donor D had a some- 
what different genomic insertion, with larger 5’ (378 bp) and 3’ (60 bp) 
LAIR1 intronic sequences, and, 5’ of the LAIR] insertion, an additional 
sequence of 135 bp corresponding to an intergenic sequence of chromo- 
some 13 (Extended Data Fig. 6b, c). In this donor, the entire LAIR1 5/ 
intronic sequence and much of the 5’ chromosome 13 sequence were 
spliced out in the mRNA (Extended Data Fig. 6d). The spliced intronic 
LAIRI region contained a duplicated 135 bp element with a very high 
load of somatic mutations (Extended Data Fig. 6e). 

The finding that the inserts were located exactly between the V 
and DJ segments and were joined to these segments by N nucleotides 
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Figure 1 | Identification of broadly reactive monoclonal antibodies 
against IEs. a, Fluorescence microscopy images of single agglutinates 
(top) and a triple agglutinate (bottom). Scale bar, 501m. b, c, Plasma 
(pooled in groups of five) from immune adults were screened against six 
parasite isolates using the triple mixed agglutination assay (b). Pools that 
formed mixed agglutinates with at least five isolates (in red) were further 
investigated for individual reactivity against an extended panel of eight 
isolates (c). d, Heat map showing the percentage of IEs of eight parasite 
isolates stained by monoclonal antibodies isolated from two donors 

(n= 1). Closely related antibodies are grouped in alternating colours. 

e, Example of staining of IEs by the broadly reactive antibody MGD55. 


would suggest that RAG might be involved in the insertion pro- 
cess. Indeed, cryptic recombination signal sequences (RSSs) that 
followed the 12/23 rule were found flanking both the LAIR1 and 
the chromosome 13 inserts, although their RSS prediction scores 
were low and they were not positioned precisely at the ends of the 
inserts (Extended Data Fig. 7). As RAG acts by excising a target 
DNA sequence, we investigated whether, in B cells making LAIR1- 
containing antibodies, one of the two LAIR1 alleles on chromosome 
19 would be deleted. By sequencing genomic DNA from T cells of 
donor C, we identified a heterozygous nucleotide site in the chro- 
mosome 19 LAIR1 exon sequence. Surprisingly, both alleles were 
also present in the B cells producing LAIR1-containing antibodies 
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(Extended Data Fig. 8), a finding that is inconsistent with the 
‘cut-and-paste’ function of RAG. 

To determine the contribution of the mutated VH, VL and LAIR1 
domains to the antibody specificity, we generated a panel of constructs 
and fusion proteins based on the broadly reactive antibody MGD21 
(Fig. 3a). Substitution of the V, J or L chain of MGD21 with that of an 
unrelated antibody did not affect binding to IEs (Fig. 3b), suggesting 
that these elements are dispensable for binding. In contrast, deletion of 
the LAIR1 insert, or its reversion to the unmutated genomic sequence, 
led to a complete loss of binding. Furthermore, fusion proteins dis- 
playing only the mutated LAIR1 domain bound to IEs, although with 
lower affinity. To dissect the contribution of the somatic mutations 
of the LAIR] insert to antigen binding, we created a set of LAIR1-Fc 
fusion proteins carrying, in various combinations, the mutations shared 
by MGD21 with other antibodies of the same clonal family. We tested 
the mutants for binding to IEs and to collagen, which is the natural 
ligand of LAIR1. Interestingly, two distinct kinds of mutations were 
identified: those that reduced collagen binding (P106S and P107R) and 
those that increased binding to IEs (T67L, N69S and A77T) (Fig. 3c). 
Collectively, these findings indicate that the binding of the broadly 
reactive antibodies to IEs relies mainly on the mutated LAIR1 domain, 
which evolves under selective pressure to lose collagen binding and 
gain binding to IEs. 

To identify the antigen(s) recognized by the LAIR1-containing 
antibodies, we generated stable P. falciparum 3D7 lines that were 
enriched (3D7-MGD21*) or depleted (3D7-MGD217) of MGD21 
reactivity (Extended Data Fig. 9a). Western blot analysis showed 
two specific MGD21-reactive bands of 40-45 kilodaltons (kDa) in 
erythrocyte ghosts and in MGD21 immunoprecipitates prepared 
from 3D7-MGD21" IEs (Fig. 4a). Analysis of the MGD21 immu- 
noprecipitates by liquid chromatography coupled with mass spec- 
trometry (LC-MS) revealed that a member of the A-type RIFIN 
family (PF3D7_1400600) was significantly enriched in 3D7-MGD21* 
immunoprecipitates as compared to 3D7-MGD21~ immunoprecipi- 
tates (log, fold change >2; P< 0.01) (Fig. 4b). PF3D7_1400600 and 
a second A-type RIFIN (PF3D7_1040300) were also identified in 
3D7-MGD21* but not in 3D7-MGD21~ ghosts in the absence of 
immunoprecipitation (Extended Data Fig. 9b). In contrast, four other 
RIFINs, including one recently characterized for its capacity to induce 
rosetting (PF3D7_0100400)°, were detected in similar amounts in 
both 3D7-MGD21* and 3D7-MGD21~ ghosts. We found that enrich- 
ment for 3D7-MGD21* IEs greatly increased recognition by all the 
other broadly reactive antibodies from donor D tested and, notably, 
by two broadly reactive antibodies from donor C (Extended Data 
Fig. 9c), suggesting that these antibodies recognize the same antigens. 
Similar results were obtained with the Kenyan isolate 9605 (Extended 
Data Fig. 9d, e). 

The binding of the LAIR1-containing antibodies to specific 
RIFINs was confirmed by the finding that MGD21 stained CHO 
cells transfected with the candidate antigens PF3D7_1400600 and 
PF3D7_1040300, but not with irrelevant RIFINs that were similarly 
expressed (PF3D7_0100400 and PF3D7_0100200) or not detected 
(PF3D7_1100500) in 3D7-MGD21+ and 3D7-MGD21~ ghosts 
(Fig. 4c). Furthermore, MGD21 and an Fc fusion protein containing 
the MGD21 LAIR1 domain stained CHO cells transfected with a RIFIN 
chimaera containing the constant region of PF3D7_0100200 and the 
variable region of PF3D7_1400600, but not cells transfected with the 
inverse chimaera (Extended Data Fig. 9f, g), indicating that MGD21 
binds to the variable region. Collectively, these results indicate that the 
LAIR1-containing antibodies recognize specific members of the RIFIN 
family in different P falciparum isolates. 

Addition of MGD21 to 3D7 culture did not interfere with parasite 
growth and did not result in decreased expression of the antigen(s) 
(Extended Data Fig. 9h, i). In addition, when tested in a rosette 
inhibition assay with O* or A* erythrocytes, MGD21 did not 
show a consistent inhibitory effect (P > 0.1 for both blood groups) 
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Figure 2 | Broadly reactive antibodies contain a mutated LAIR1 

insert and are produced by expanded clones. a, Protein sequence 
alignment of MGC1 and MGD21 with germline-encoded sequences of the 
corresponding VH (green or purple), DH (cyan), JH (blue) and LAIR1 
segments (exon in red and intronic sequences in light red). Chromosome 
13 (Chr13) sequences are shown in orange while grey areas show 
junctional sequences for which no homology was found. b, c, Genealogy 
trees drawn from the VH nucleotide sequences of antibodies from donors 


(Extended Data Fig. 9j). In contrast, MGD21, as well as MGC34, 
could agglutinate erythrocytes infected with 3D7 or the Kenyan iso- 
late 11019 (Extended Data Fig. 9k). Furthermore, MGD21 showed 
a strong capacity to opsonize 3D7 IEs for phagocytosis by human 
monocytes (Fig. 4d). Opsonization was dependent on an intact Fc, as 
a mutant lacking Fc receptor binding (MGD21 LALA) did not induce 
phagocytosis. Similar results were obtained with other broadly reactive 
antibodies isolated from both donors and with a different parasite 
isolate (11019) (Extended Data Fig. 91), suggesting that these broadly 
reactive antibodies could be effective in promoting phagocytosis and 
destruction of IEs in vivo. 

Our study opens several questions as to the potential use of RIFINs 
as targets for passive and active vaccination. RIFINs represent the larg- 
est family (~150 genes) of variant antigens expressed on IEs, some of 
which have been implicated in severe malaria*. The LAIR1-containing 
antibodies have potent agglutinating and opsonizing activity, which 
would be consistent with their role in decreasing the burden of IEs 
in vivo by enhancing parasite clearance. However, the staining of only 
a fraction of IEs by the LAIR1-containing antibodies is consistent with 
the clonal expression of RIFINs* and suggests that these antibodies may 
not be sufficient to take full control of the infection. It will be interest- 
ing to determine whether the LAIR1-containing antibodies recognize 
RIFINs that are expressed at other stages of the parasite life cycle, such 
as sporozoites, merozoites and gametocytes””*, which may create new 
opportunities for vaccine design. 


Junction 


LAIR1 intron 


Genomic DNA 
v 


cDNA 


C (b) and D (c). In the donor C genealogy tree, antibodies that use 
different light chains are highlighted in different colours. Shown are the 
nucleotide and amino acid substitutions, with the latter in parentheses. 
UCA, unmutated common ancestor. d, Scheme showing genomic DNA 
and cDNA of LAIR1-containing antibodies from donors C and D. Shown 
are the lengths of the fragments (bp in parentheses), cryptic 12 and 23 
RSS sites (black and brown triangles, respectively) and splicing positions 
(dashed lines). 


The unusual architecture of the LAIR1-containing antibodies illus- 
trates a novel mechanism of interchromosomal DNA transposition that 
can contribute to antibody diversification (Extended Data Fig. 10). The 
precise location of the LAIR1 and chromosome 13 inserts between the 
V and DJ segments, as well as the presence of N nucleotides and cryptic 
12/23 RSSs at the ends of the inserts, would be compatible with a role 
for the RAG enzyme. RAG has been implicated in interchromosomal 
genomic rearrangements at cryptic RSSs outside the immunoglobulin 
and T-cell antigen receptor (TCR) loci”, and in the formation of chro- 
mosomal translocations found in human lymphomas"!. However, RSSs 
are frequently found in the genome and are generally inactive, according 
to recent data!”!°, Furthermore, the conservation of the two LAIR] alleles 
in B cells producing LAIR1-containing antibodies is inconsistent with a 
RAG-mediated ‘cut-and-paste’ pathway and suggests a new mechanism 
by which LAIRI DNA is duplicated. This mechanism may involve reverse 
transcription of pre-mRNA and subsequent insertion of the duplicated 
fragment to repair a DNA double-strand break, as recently proposed“. 
It is also possible that gene conversion!» or AID-dependent genomic 
instability caused by chronic Plasmodium infection!*® may contribute to 
the production of LAIR1-containing antibodies. AID can lead to inser- 
tions and deletions of multiple codons in the V genes, which contribute 
to the specificity of the antibody in the context of the whole V gene!”"®. 
Nevertheless, to the best of our knowledge, these insertions are distrib- 
uted over the whole V-gene sequence, are of smaller size and cannot be 
traced back to a particular genomic sequence as in the case of LAIR1. 
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Figure 3 | The mutated LAIR1 insert is necessary and sufficient for 
binding to IEs. a, Design of modified MGD21 antibody constructs with 
selected regions replaced with counterparts from an unrelated antibody 
(F1499) (C1-C2, C9), deleted (C3-C6), or reverted to germline (GL) 
(C7-C8). Fe fusion proteins that incorporated the LAIR] insert, junction 
and downstream sequences (F1), as well as the LAIR1 domain alone 

(F2), were also designed. b, Binding of MGD21 constructs and Fc fusion 
proteins to IEs (representative of n = 2 independent experiments). 

c, Selected amino acid substitutions found in MGD21 were added 
individually or in different combinations to the germline LAIR1-Fc fusion 
protein. These mutants were tested for binding to collagen and to IEs. 
Shown are the effects of the mutations on binding to IEs or collagen (one 
representative of n =2 independent experiments) and their location on 
the LAIR1 structure!? (Protein Data Bank accession 3KGR). Gain of IE 
binding is shown in green (background mean fluorescence intensity (MFI) 
values subtracted). Loss of collagen binding (half-maximum effective 
concentration (ECs9) enzyme-linked immunosorbent assay (ELISA) 
values) is shown in red. WTT, wild type. 


The transposition of LAIR1 (and chromosome 13) sequences into V-DJ 
genes is the first example of an insertion that gives rise to a functional 
antibody in which the insert represents the fundamental binding 
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Figure 4 | LAIR1-containing antibodies bind to distinct RIFINs 

and opsonize IEs. a, Western blot showing MGD21 binding to 
erythrocyte ghosts and MGD21 immunoprecipitates (IP) prepared 

from 3D7-MGD21* and 3D7-MGD21/ IEs (representative of n= 2 
independent experiments). Controls include uninfected erythrocytes 
(uEs) and immunoprecipitates with an irrelevant antibody (BKC3). 
Specific bands are marked with asterisks. Anti-human IgG was used 

as the secondary antibody, resulting in detection of antibodies used 

for immunoprecipitation alongside antigens of interest. For gel source 
data, see Supplementary Fig. 1. Numbers on right indicate kDa. 

b, Volcano plot from LC-MS analysis of MGD21 immunoprecipitates 
prepared from 3D7-MGD21"* IEs versus from 3D7-MGD21~ IEs (from 
n=4 independent experiments). Statistical significance was evaluated 

by Welch tests (P < 0.01 for PF3D7_1400600). c, MGD21 and BKC3 
staining of CHO cells transfected with a specific (PF3D7_1400600) or an 
irrelevant (PF3D7_0100200) RIFIN (representative of n = 5 independent 
experiments). d, Opsonic phagocytosis of 3D7-MGD21* IEs by monocytes 
(n=3 for MGD21, MGD21 LALA and BKC3, n= 2 for others). The IEs 
were stained with 4’,6-diamidino-2-phenylindole (DAPI), which was 
quantified in monocytes as a measure of phagocytosis. MGD21 LALA isa 
mutant of MGD21 lacking Fc receptor binding. 


element. It remains to be established how often this novel mechanism 
may give rise to functional antibodies and whether sequences other 
than LAIR] are transposed into immunoglobulin genes. We anticipate 
that LAIR1-containing antibodies will be frequently found in malar- 
ia-endemic regions and speculate that the transposed LAIR1 domain 
may serve to bind other foreign antigens and possibly also collagen in 
patients with rheumatic diseases. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Parasite culture and selection. The Plasmodium falciparum clone 3D7 and nine 
laboratory-adapted parasite isolates from severe and non-severe malaria patients 
in Kilifi, Kenya (sampled between 2009 and 2010), were cultured in vitro according 
to standard procedures” and cryopreserved at the late trophozoite stage for use 
in subsequent assays. To select for MGD21-reactive infected erythrocytes (IEs), 
cultured IEs were incubated with MGD21 for 20 min at room temperature, washed, 
and rotated with Protein G-coated magnetic beads (Life Technologies) for 30 min at 
room temperature. Following magnetic sorting, enriched (MGD21*) and depleted 
(MGD21_) fractions were returned to in vitro culture. 

Patients. Donors C and D are 29 and 38 years old, respectively, and are lifelong 
residents of an area with moderate malaria transmission intensity (that is, with 
an entomological inoculation rate of 21.7 infective bites per person per year)!. 
Adults in this area are clinically immune from febrile malaria, having acquired 
immunity during childhood. The two donors were P._falciparum-negative during 
sample collection. The experiments were not randomized. The investigators were 
not blinded to allocation during experiments and outcome assessment. 

Triple mixed agglutination assay. Following informed consent, plasma samples 
were taken from 2007 to 2014 from 557 adults living in a malaria-endemic region 
within Kilifi County on the coast of Kenya. The study was approved by the Kenya 
Medical Research Institute Ethics Review Committee and the Oxford Tropical 
Research Ethics Committee. IEs from three parasite isolates were separately stained 
with 10g ml~! DAPI, 200j1g ml! ethidium bromide or 6.7x SYBR Green I for 
1h at room temperature. The stained parasites were washed five times, mixed in 
equal proportions, and diluted to a 5% haematocrit in incomplete RPMI medium. 
Ten microlitres of the parasite mixture was rotated with 2.5 1l of adult plasma for 
1.5h at room temperature, and agglutinates formed were examined by fluorescence 
microscopy. In the primary screen, pools of five adult plasma were tested against 
six Kenyan isolates (in two separate reactions). Pools that formed mixed-colour 
agglutinates were identified and individual plasma within these pools were tested 
against nine isolates using the same assay. A single isolate (10668) was not detected 
in mixed agglutinates formed by any of the plasma and was therefore excluded from 
the study. Two adults (donors C and D) with plasma that formed mixed aggluti- 
nates with eight parasite isolates were selected for monoclonal antibody isolation 
and, following further informed consent, an additional blood sample was taken 
from each individual in February 2014. 

B-cell immortalization and isolation of monoclonal antibodies. IgG* mem- 
ory B cells were isolated from cryopreserved peripheral blood mononuclear cells 
(PBMCs) by magnetic cell sorting with mouse anti-CD19-PECy7 antibodies (BD 
Pharmingen, catalogue no. 341113) and mouse anti-PE microbeads (Miltenyi 
Biotec, catalogue no. 130-048-081), followed by flow cytometry sorting for IgG* 
IgM IgD~ cells. The B cells were immortalized with Epstein-Barr virus (EBV) in 
the presence of CpG-DNA (2.5,.gml-') and irradiated feeder cells as described 
previously’. Two weeks post-immortalization, culture supernatants were tested for 
the ability to stain IEs from eight parasite isolates by flow cytometry. Cryopreserved 
IEs were thawed, stained with 10x SYBR Green I, and incubated with the B-cell 
supernatants for 1h at 4°C. Antibody binding was detected using 2.5,.gml! of 
goat Alexa Fluor 647-conjugated anti-human IgG (Jackson ImmunoResearch, 
catalogue no. 109-606-170). Reactivity was calculated based on the percentage of 
late-stage parasites (high SYBR Green) recognized by each antibody. 

Sequence analysis of antibody cDNA and genomic DNA. cDNA was synthe- 
sized from selected B-cell cultures and both heavy chain and light chain varia- 
ble regions (VH and VL) were sequenced as previously described”*. The usage 
of VH and VL genes and the number of somatic mutations were determined by 
analysing the homology of VH and VL sequences of monoclonal antibodies to 
known human V, D and J genes in the IMGT database’. Genomic DNA was 
isolated from two B-cell lines of donor C and one B-cell line of donor D with 
a commercial kit (QIAGEN), and antibody-encoding sequences were amplified 
and sequenced with primers specific for the V and J regions of the given anti- 
body. Sequences were aligned with ClustalW2 (ref. 24). Potential cryptic RSS 
sites were identified using the RSSsite web server”*. To determine the heterozy- 
gosity of LAIRI on chromosome 19, the following primers were used to perform 
PCRs on genomic DNA: LAIR1_INTR_FW1, GGCGGTGGGCACTCAGGTTC; 
LAIR1_INTR_REV1, CACAGGCAGTCACCGGGTCTAGG; LAIR1_INTR_ 
FW2, GGATGCACCATGTCACCCAGTCCTGG. Genomic DNA isolated from 
PHA- and IL-2-stimulated T cells from donor C was used as a control for sequence 
analysis. 

Immunoglobulin lineage and genealogy analysis. Unmutated common ances- 
tor (UCA) sequences of the VL region were inferred with Antigen Receptor 
Probabilistic Parser (ARPP) UA Inference software, as previously described”°. 
UCA sequences of the VH region were constructed using IMGT/V-QUEST” and 
the genomic insert sequences. Nucleotide sequences of the mutated antibodies 
and the UCA were aligned using ClustalW2 (ref. 24), and phylogenetic trees were 


generated with the DNA Maximum Likelihood program (Dnaml) of the PHYLIP 
package, version 3.69 (refs 27, 28). 

Production of recombinant antibodies, antibody variants and fusion 
proteins. Antibody heavy and light chains were cloned into human IgG1, Igk. 
and Ig) expression vectors” and expressed by transient transfection of Expi293F 
Cells (ThermoFisher Scientific) using polyethylenimine (PEI). Cell lines were 
routinely tested for mycoplasma contamination. The antibodies were affinity 
purified by protein A chromatography (GE Healthcare). Variants of the MGD21 
antibody were produced by (1) exchanging Vy, Dy, Ju elements or the light chain 
with the corresponding sequences of an irrelevant antibody (F1499, reactive to 
influenza virus”), (2) deleting selected segments, or (3) reverting somatic muta- 
tions to the germline configuration with reference to the IMGT database and the 
original LAIRI genomic sequence (NCBI reference sequence NC_018930.2). In 
addition, LAIR1-Fc fusion proteins were produced recombinantly by cloning 
the mutated or unmutated LAIR1 fragment into a plasmid designed for expres- 
sion of human IgG1 fusion proteins (pPINFUSE-hIgG1-Fc2, Invivogen). On the 
basis of an alignment of the most potent LAIR1-containing antibodies with the 
unmutated LAIR1 sequence, five key residues that could contribute to gain of 
binding to IEs and loss of binding to collagen were identified and added alone 
or in various combinations to the unmutated LAIR1-Fc fusion protein. The 
MGD21 constructs and LAIR1 domain mutants were tested for staining of 3D7 
IEs that were enriched for MGD21 recognition (3D7-MGD21*). For the LAIR1 
domain mutants, MFI values at 1 ug ml! antibody concentration were calculated 
by interpolation of binding curves fitted to a linear regression model (Graphpad 
Prism 6). 

ELISA. Total IgGs were quantified using 96-well MaxiSorp plates (Nunc) coated 
with goat anti-human IgG (SouthernBiotech, catalogue no. 2040-01) using 
Certified Reference Material 470 (ERMs-DA470, Sigma-Aldrich) as a standard. 
To test binding to human collagen type I, ELISA plates were coated with 51g ml! 
of type I recombinant human collagen (Millipore, catalogue no. CC050), blocked 
with 1% bovine serum albumin (BSA) and incubated with titrated antibodies, 
followed by AP-conjugated goat anti-human IgG, Fc~ fragment specific (Jackson 
ImmunoResearch, catalogue no. 109-056-098). Plates were then washed, substrate 
(p-NPP, Sigma) was added and plates were read at 405 nm. 
Immunoprecipitation and LC-MS. Erythrocyte ghosts were prepared by hypo- 
tonic lysis with 1x PBS diluted 15-fold in water, and ghost membranes were dis- 
solved in a reducing lysis buffer containing 2% SDS, 10 mM dithiothreitol (DTT), 
10mM HEPES pH 8, sonicated and boiled. Solubilized proteins were alkylated with 
iodoacetamide (final concentration 551M) for 30 min at room temperature and 
precipitated with 80% acetone overnight at 4°C. The precipitates were resuspended 
in urea and digested with trypsin. For immunoprecipitation experiments, IEs were 
sonicated and dissolved in 7.2 M urea in RIPA buffer (1% Triton X-100, 0.1% SDS, 
0.5% sodium deoxycholate in HBS pH 7.4). The samples were centrifuged and 
supernatants were diluted 6.7-fold with RIPA buffer containing a protease inhibitor 
cocktail (Sigma-Aldrich) and incubated with 101g of MGD21 or BKC3 overnight 
at 4°C. Next, Protein G-Sepharose beads (GE Healthcare) were added and samples 
were incubated for 1h at 4°C. The beads were washed four times and immunopre- 
cipitates were digested directly on the beads with trypsin. After trypsin digestion, 
peptides were analysed on a Q-Exactive instrument at the Functional Genomics 
Center in Zurich. Raw files were analysed using the MaxQuant software”? and 
MS/MS spectra were searched against the human and P falciparum 3D7 UniProt 
FASTA databases (UP000005640 and UP000001450). Peptide identifications were 
matched across several replicates. Subsequent data analysis was performed in the 
R statistical computing environment. Missing values were imputed with a normal 
distribution around an LFQ value of 21. Statistical significance was evaluated by 
Welch tests. 

Western blots. Ghosts and immunoprecipitates were dissolved in 2x SDS sample 
buffer (Bio-Rad) and run on a 12% polyacrylamide gel under non-reducing condi- 
tions. The proteins on the gel were transferred onto a PVDF membrane, which was 
blocked with 5% milk in TBS with 0.1% Tween (TBST) for 1h at room temperature. 
The membrane was incubated with 5\1gml~' MGD21 overnight at 4°C, washed 
with TBST, and developed with horseradish peroxidase (HRP)-conjugated sheep 
anti-human IgG (GE Healthcare, catalogue no. NA933) used in combination with 
a chemiluminescent substrate. 

Expression of RIFINs. Genes encoding the A-RIFINs PF3D7_1400600, 
PF3D7_1040300, PF3D7_0100400, PF3D7_0100200 and PF3D7_1100500 
were produced by gene synthesis (Genscript) and cloned into the pDisplay 
vector (Invitrogen), which contains a haemagglutinin (HA) tag, as previously 
described?. RIFIN chimaeras containing the constant region of PF3D7_1400600 
(residues 38-146) and the variable region of PF3D7_0100200 (residues 
151-288) (PF3D7_1400600c_0100200v), or containing the constant region of 
PF3D7_0100200 (residues 42-150) and the variable region of PF3D7_1400600 
(residues 147-325) (PF3D7_0100200c_1400600v), were generated. The pDisplay 
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constructs were transiently transfected into CHOK1-SV cells (GS-System, Lonza) 
using PEI. Cell lines were routinely tested for mycoplasma contamination. Briefly, 
1 day before transfection, CHOK1-SV cells were seeded at 0.5 x 10° cells ml! in 
30ml CD-CHO medium (Invitrogen) supplemented with 2mM L-glutamine in 
125 ml Erlenmeyer flasks (Corning). On the day of transfection, 201g DNA was 
diluted in OPTI-PRO SFM Medium (Invitrogen) and mixed with 200j1g PEI 
for 20 min at room temperature. The DNA-PEI complexes were added to the 
cells, which were cultured in a CO) shaker incubator at 37°C, 135r.p.m. After 
72h, the expression of RIFINs and their recognition by the LAIR1-containing 
antibodies were tested by flow cytometry. Briefly, 51g ml! of rabbit anti-HA tag 
and 21g ml! of MGC or MGD antibodies were added to the RIFIN-transfected 
cells. Antibody binding was detected by 5 »g ml! of Alexa Fluor 488-conjugated 
goat anti-rabbit IgG (Life Technologies, catalogue no. A11034) and 2.5,.g ml! 
of Alexa Fluor 647-conjugated goat anti-human IgG (Jackson ImmunoResearch, 
catalogue no. 109-606-170). Dead cells were excluded by staining with 7-AAD 
(BD Biosciences). 

Inhibition of parasite growth. 3D7-MGD21* (5% parasitaemia, ring stage) was 
cultured with various concentrations of MGD21 or BKC3 for 2 days. After 2 days, 
10x SYBR Green I was added to aliquots of each culture and parasitaemia was 
quantified by flow cytometry. The remaining parasites in each culture were washed 
to remove the antibodies and incubated for 1 day to allow the parasites to reach the 
late trophozoite/schizont stage. MGD21 recognition of these cultures was detected 
using 2.5j1g ml! of Alexa Fluor 647-conjugated goat anti-human IgG (Jackson 
ImmunoResearch, catalogue no. 109-606-170). 

Inhibition of rosetting. 9605-MGD21* IEs at the late trophozoite/schizont stage 
were purified from uninfected erythrocytes and ring-stage parasites using a mag- 
netic column (Miltenyi Biotec) and were resuspended in culture medium with 
10% human serum. The purified IEs were incubated with 10gml~' of MGD21 or 
BKC3 for Lh at 4°C, mixed with O* erythrocytes or At erythrocytes in a 1:20 ratio, 
and incubated for 30 min at room temperature to allow rosetting to occur. The IEs 
were stained with 10x SYBR Green I, and the number of rosettes formed by at least 
200 IEs was counted by fluorescence microscopy to calculate the rosetting rate. 
Agglutination with monoclonal antibodies. 3D7-MGD21* and 11019-MGD21* 
IEs(4—5% parasitaemia) were diluted to a 3% haematocrit in a 5x SYBR Green I 
solution containing 51g ml“! of the test monoclonal antibody. Each sample was 
rotated for 1h at room temperature and subsequently examined by fluorescence 
microscopy. 
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Opsonic phagocytosis by monocytes. IEs were stained with 10,.gml~! DAPI 
for 30 min at room temperature, washed four times and run on a magnetic col- 
umn (Miltenyi Biotec) to purify late-stage parasites. The purified parasites were 
opsonized with serially diluted antibodies for 1h at 4°C. Monocytes were iso- 
lated from fresh PBMCs of healthy donors using mouse anti-CD14 microbeads 
(Miltenyi Biotec, catalogue no. 130-050-201) and mixed with the opsonized par- 
asites in a 1:2 ratio for 1 h at 37°C. Extracellularly bound, non-internalized IEs 
were lysed by treatment with red blood cell lysis solution (Miltenyi Biotec) for 
10 min at room temperature. The cells were stained with mouse anti-CD14-PECy5 
(Beckman Coulter, catalogue no. A07765) and analysed by flow cytometry. The 
mean fluorescence intensity (MFI) of DAPI in CD14* cells was used as a measure 
of phagocytosis of IEs by monocytes. 

Statistics. The Wilcoxon signed-rank test was used for statistical comparisons of 
pairs of data groups in rosetting experiments. No statistical methods were used to 
predetermine sample size. 
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Extended Data Figure 1 | Nucleotide sequence alignments of VH regions of antibodies isolated from donor C. Dots indicate positions where the 
nucleotide of a mature antibody is identical to that of the UCA. 
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Extended Data Figure 2 | Nucleotide sequence alignments of VH regions of antibodies isolated from donor D. Dots indicate positions where the 
nucleotide of a mature antibody is identical to that of the UCA. 
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Extended Data Figure 3 | Protein sequence alignments of VH regions of antibodies isolated from donors C and D. a, Donor C. b, Donor D. Putative 
complementarity-determining regions (CDRs) are highlighted in red. Dots indicate positions where the amino acid of a mature antibody is identical to 
that of the UCA. 
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Extended Data Figure 4 | Nucleotide sequence alignments of VL regions _ from donor D use VK1-8/JK5 (d). Complementarity-determining 
of antibodies isolated from donors C and D. a-d, Antibodies from donor _ regions (CDRs) are highlighted in red. Dots indicate positions where the 
C use VL7-43/JL3 (a), VK4-1/JK2 (b), or VK1-5/JK2 (c), while antibodies nucleotide of a mature antibody is identical to that of the UCA. 
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Extended Data Figure 5 | Genealogy trees generated from VL and VL(1), VL(2) and VL(3) refer to VL7-43/JL3, VK1-5/JK2 and VK4-1/JK2, 
LAIRI1 exon sequences. a—d, The trees were drawn based on the somatic respectively. Shown are the nucleotide and amino acid substitutions, with 
mutations in light chain variable regions (a, b) or LAIR1 exons (c, d) of the latter in parentheses. 


the antibodies isolated from donors C and D. In the donor C VL trees, 
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Extended Data Figure 6 | Genomic DNA analysis of LAIR1-containing the corresponding region of chromosome 13 from gDNA. The sequence 
antibodies of donor C and donor D. a, The sequence alignment of genomic —_ maintained in the mature antibody mRNA is boxed and the splice donor site 
DNA (gDNA) and cDNA ofa LAIR1-containing antibody (AB) from donor _ is highlighted in yellow. d, Alignment of gDNA and cDNA reveals that a part 
C (DonC) reveals a 507 bp LAIR1 insert in chromosome 14 (Chr14) and the of the chromosome 13 region and the entire inserted 5’ LAIR1 intron are 


removal of a 160 bp fragment by RNA splicing. Splice donor and acceptor removed by RNA splicing. Splice donor and acceptor sites are highlighted 
sites are highlighted in yellow. b, Schematic overview of the genomic in yellow. e, Alignment of the two repeated elements found in the inserted 
organization of a LAIR1-containing antibody from donor D, not to scale. LAIR1 intron in chromosome 14 with the corresponding sequence in 


c, Alignment of a region of antibody-encoding DNA (chromosome 14) with chromosome 19. The repeats are named R1 and R2, and K=G/A. 
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(Score) 12-cryptic-RSS 23-cryptic-RSS-antiparallel (Score) 


MGC_Chr19 (-57.29) TCTGCAGTGATGAGAATCACATGCACGTAGAA......... GAGCTGCTGGTGAAAGGTGAGGACGTCACCTGGGCCCTG (-79.68) 
MGD_Chr19 (-62.84) TTGTGAGCAAGTCTCAGGGTCCTCACTGTCAACTG......... CTGGGCCCTGCCCCAGTCTCAGCTCGACCCTCGAGCTTGTCCCCAGG (-77.42) 


MGD_Chr13 (-64.12) ATTCAAGTCTCAGAGGGACACCAGTGTGTTT.......... TGACAAGTGGGGTCTTGGAGTTCTTTAATTTTCCCATTGA (-75.63) 


Extended Data Figure 7 | LAIR1 and chromosome 13 inserts are flanked _ the RSSsite web server. The sequences shown begin from the ends of the 
by 12/23 cryptic RSS sites. The regions on chromosome 19 (Chr19) and inserts. Cryptic RSS sites are highlighted in grey, with complementary 
chromosome 13 (Chr13) of donor-derived gDNA corresponding to the ends underlined and prediction scores shown in parentheses. 

ends of the inserts were sequenced and RSS sites were identified using 
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Extended Data Figure 8 | Both LAIR1 alleles on chromosome 19 are 
intact in B cells producing LAIR1 antibodies. Heterozygosity of the 
chromosome 19 LAIRI1 exon in cells from donor C showing that both 
LAIR1 alleles are intact in B cells producing LAIR1-containing antibodies. 
Displayed are the chromatograms obtained for B-cell clones with or 
without a LAIR] insertion (LAIR1* or LAIR1~ B cell) and for polyclonal 
T cells. Y=C/T 
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Extended Data Figure 9 | See next page for figure caption. 
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Extended Data Figure 9 | Reactivity and functional assays of MGC 
and MGD antibodies. a, MGD21 staining of 3D7 IEs that were enriched 
or depleted of MGD21 reactivity (representative of n =3 independent 
experiments). WT, wild type. b, Heat map from LC-MS analysis 

showing RIFIN expression levels (calculated as intensity-based absolute 
quantification (iBAQ) scores) in erythrocyte ghosts prepared from 3D7- 
MGD21* and 3D7-MGD217 IEs (two experiments shown). Grey boxes 
indicate that expression levels are below the detection limit. c, Shown is 
the percentage of IEs (representative of n = 2 independent experiments) 
or of transfected CHO cells (n = 1) stained by the antibodies. RIFINs that 
were enriched in 3D7-MGD21* ghosts are highlighted blue, while RIFINs 
that were similarly expressed or not detected in 3D7-MGD21~ and 3D7- 
MGD21* ghosts are shown in red. BKC3 is a negative control antibody. 
d, Western blot showing MGD21 binding to immunoprecipitates (IP) 
prepared from 9605-MGD21~ and 9605-MGD21* IEs (representative 

of n = 2 independent experiments). Specific bands are marked with an 
asterisk. Anti-human IgG was used as the secondary antibody, resulting in 
detection of antibodies used for immunoprecipitation alongside antigens 
of interest. For gel source data, see Supplementary Fig. 1. Numbers on 
left indicate kDa e, Percentage of 9605-MGD21~ and 9605-MGD21* IEs 
recognized by representative MGC and MGD antibodies (representative 
of n =2 independent experiments). f, Binding of MGD21 to CHO cells 
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transfected with RIFINs (PF3D7_1400600 and PF3D7_0100200), a RIFIN 
chimaera containing the constant region of PF3D7_0100200 and the 
variable region of PF3D7_1400600 (PF3D7_0100200c_1400600v), or the 
inverse chimaera (PF3D7_1400600c_0100200v) (n= 1). g, Binding of an 
Fc fusion protein containing the LAIR1 domain of MGD21 to CHO cells 
transfected with RIFINs or RIFIN chimaeras (n = 1). h, Parasitaemia of 
3D7-MGD21° in vitro culture after 2 days of incubation with various 
concentrations of MGD21 or an irrelevant antibody (BKC3) (n= 1). mAb, 
monoclonal antibody. i, Percentage of 3D7-MGD21* IEs recognized by 
MGD21 after 2 days of incubation with various concentrations of MGD21 
or BKC3. The antibodies were removed after 2 days (during the ring stage 
of the life cycle) and the parasites were allowed to grow for 24h to the 

late trophozoite/schizont stage before detection with MGD21 (n= 1). 

j, Rosetting of 9605-MGD21°* IEs with blood group O* or A* uninfected 
erythrocytes (uEs) after incubation with MGD21 or BKC3. Shown is the 
mean + s.d. from n= 4 independent experiments. Statistical significance 
was evaluated by the Wilcoxon signed-rank test (P > 0.1 for both blood 
groups). NS, not significant. k, Agglutinates of 3D7-MGD21* or 11019- 
MGD21" IEs formed by MGD21 or MGC34. Scale bar, 251m. 1, Opsonic 
phagocytosis of 11019-MGD21* IEs by monocytes (n = 2). The IEs were 
stained with DAPI, which was quantified in monocytes as a measure of 
phagocytosis. 
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Insulator dysfunction and oncogene activation in 


IDH mutant gliomas 
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Anat O. Stemmer-Rachamimov!, Mario L. Suva? & Bradley E. Bernstein 


Gain-of-function IDH mutations are initiating events that define 
major clinical and prognostic classes of gliomas’”. Mutant IDH 
protein produces a new onco-metabolite, 2-hydroxyglutarate, 
which interferes with iron-dependent hydroxylases, including the 
TET family of 5’-methylcytosine hydroxylases*-’. TET enzymes 
catalyse a key step in the removal of DNA methylation®®. IDH 
mutant gliomas thus manifest a CpG island methylator phenotype 
(G-CIMP)!%"1, although the functional importance of this altered 
epigenetic state remains unclear. Here we show that human IDH 
mutant gliomas exhibit hypermethylation at cohesin and CCCTC- 
binding factor (CTCF)-binding sites, compromising binding 
of this methylation-sensitive insulator protein. Reduced CTCF 
binding is associated with loss of insulation between topological 
domains and aberrant gene activation. We specifically demonstrate 
that loss of CTCF at a domain boundary permits a constitutive 
enhancer to interact aberrantly with the receptor tyrosine kinase 
gene PDGFRA, a prominent glioma oncogene. Treatment of IDH 
mutant gliomaspheres with a demethylating agent partially restores 
insulator function and downregulates PDGFRA. Conversely, 
CRISPR-mediated disruption of the CTCF motif in IDH wild-type 
gliomaspheres upregulates PDGFRA and increases proliferation. 
Our study suggests that IDH mutations promote gliomagenesis by 
disrupting chromosomal topology and allowing aberrant regulatory 
interactions that induce oncogene expression. 

The human genome is organized into topological domains that 
represent discrete structural and regulatory units'*. Such domains are 
evident in genome-wide contact maps generated by high-throughput 
chromatin conformation capture (HiC) techniques!%, and have been 
termed ‘topologically associated domains’ or ‘contact domains’!*"*. 
Recent studies have strengthened the role of the CTCF insulator 
protein in creating chromatin loops and boundaries that partition 
such domains!®. Genomic alterations that remove CTCF-associated 
boundaries allow aberrant enhancer-gene interactions and alter gene 
expression’”. 

Since CTCF binding is methylation-sensitive’”””’, its localization 
might be altered by DNA hypermethylation in JDH mutant glio- 
mas. We therefore used chromatin immunoprecipitation followed 
by high-throughput sequencing (ChIP-seq) to map CTCF binding 
genome-wide in 11 primary tumours and 4 glioma cell lines. Although 
CTCF binding patterns tend to be relatively stable, we detected highly 
overlapping subsets of CTCF sites that were lost in IDH mutants 
(Fig. 1a, b and Methods). Significantly more sites were commonly 
lost than gained (625 versus 300, P< 107!”). Whole-genome bisulfite 
sequencing data from The Cancer Genome Atlas (TCGA)" was used to 
assess the methylation status of 625 loci with reduced CTCF binding in 
mutant tumours. We found that these loci have higher GC content, and 
exhibit significantly higher levels of DNA methylation in [DH mutant 
gliomas relative to IDH wild type (Fig. 1c, d). 


18,19 


1,2;3 


We considered that altered DNA methylation and CTCF binding 
might disrupt topological domain boundaries and gene insulation 
in IDH mutant tumours. We collated a set of constitutive domain 
boundaries based on kilobase (kb)-resolution HiC maps!°. We then 
examined published RNA-seq expression data for 357 normal brain 
tissue samples”. Consistent with previous studies’®, we found that 
genes in the same domain correlate across samples, but that genes 
separated by a boundary show lower correlation (Fig. le). We next 
incorporated expression data for 230 IDH mutant (218 IDH1 mutant 
and 12 IDH2 mutant) and 56 wild-type lower-grade gliomas, generated 
by TCGA”. Here again we found that the presence of an intervening 
boundary reduces correlation between neighbouring genes. We next 
scanned the genome for pairs of proximal genes separated by less 
than 180 kb (the average contact domain size!) that correlate much 
more strongly in JDH mutants than in wild-type gliomas (Fig. 1f and 
Methods). Remarkably, the resulting set is strongly enriched for gene 
pairs that cross domain boundaries (90% versus 69% expected at ran- 
dom; P< 10~*). Conversely, gene pairs that correlate less strongly in 
IDH mutants are more likely to reside in the same domain (52% versus 
31% expected at random; P< 107°). Notably, CTCF knockdown has 
been shown to increase cross-boundary interactions and decrease intra- 
domain interactions”’. Thus, altered expression patterns in JDH mutant 
gliomas may reflect reduced CTCF binding and consequent disruption 
of domain boundaries and topologies. 

We next sought to pinpoint specific boundaries that were disrupted 
by IDH mutation. For all pairs of genes separated by <1 megabase 
(Mb), we computed their correlation across mutant and wild-type 
IDH gliomas. We then scanned for loci in which cross-boundary gene 
pairs correlate more strongly in mutant tumours (false discovery rates 
(FDR) < 1%), while intra-domain gene pairs correlate less strongly 
(FDR < 1%). This analysis highlighted 203 domain boundaries 
(Fig. 2a, Supplementary Table 1 and Methods). The putatively disrupted 
boundaries exhibit higher DNA methylation and lower CTCF binding 
in IDH mutant compared with wild-type tumours (Extended Data 
Fig. 1). These data suggest that the methylator phenotype disrupts 
CTCF binding and domain boundaries, thereby affecting gene expres- 
sion in IDH mutant gliomas. 

We hypothesized that altered domain topologies might contribute to 
gliomagenesis by activating oncogenes that are normally insulated by 
domain boundaries. We therefore scanned the domains adjacent to the 
disrupted boundaries for genes with higher expression in JDH mutant 
than in wild-type gliomas (Fig. 2a). Genes in top-scoring domains 
include PDGFRA (P< 1077), an established glioma oncogene”, and 
other candidate regulators of gliomagenesis (Supplementary Table 1). 

The identification of PDGFRA as a potential target of epigenetic 
deregulation in [DH mutants was of particular interest, given its promi- 
nence as a glioma oncogene and established roles for PDGFA signalling 
in the normal brain. Although PDGFRA is a frequent target of genomic 
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Figure 1 | CTCF binding and gene insulation compromised in IDH 
mutant gliomas. a, Binding profiles for the methylation-sensitive insulator 
CTCE are shown for a representative locus in IDH mutant and wild-type 
tumours, normalized by average signal. b, Scatterplot compares CTCF 
binding signals between IDH mutant (y axis) and wild-type (x axis) gliomas 
for all detected CTCF sites. A larger fraction of sites is commonly lost in all 
IDH mutants (n = 625) than gained (n = 300). c, Histogram compares GC 
content between CTCF sites that are lost or retained. d, Box plots show DNA 
methylation levels over lost CTCF sites, as determined by whole-genome 
bisulfite data for three IDH wild-type and three DH mutant tumours. 

e, Plot depicts average correlation between gene pairs as a function of 
distance across RNA-seq profiles for human brain’’. Gene pairs separated by 
a constitutive CTCF-bound boundary per HiC’® have lower correlations. 

f, Volcano plot depicts the significance (y axis) of gene pairs that are more 
(or less) correlated in JDH mutant than in wild-type (WT) lower-grade 
gliomas. Gene pairs with significantly increased correlations in [DH mutants 
(right) tend to cross boundaries (orange), while those with decreased 
correlations (left) are more likely reside in the same domain (blue). These 
data indicate that DH mutant, G-CIMP gliomas have reduced CTCF binding 
and altered expression patterns suggestive of defective gene insulation. 


amplification and gain-of-function mutations in glioblastoma (15%), 
such alterations are rare in [DH mutant tumours**. Nonetheless, 
IDH mutant gliomas strongly express PDGFRA (Fig. 2b), and share the 
proneural transcriptional program characteristic of PDGFRA-amplified 
tumours”*4, Closer examination of the expression patterns in IDH 
mutant gliomas reveals a marked correlation between PDGFRA and 
FIP1L1, despite an intervening boundary (Fig. 2c). FIPIL1 encodes 
an RNA-processing protein that is constitutively expressed in neural 
tissues, and particularly active in oligodendrocyte precursors, a putative 
glioma cell of origin?” (Extended Data Fig. 2a). Moreover, combined 
expression of PDGFRA and FIPI1L1 is associated with poorer outcome 
in IDH mutant lower-grade gliomas (Extended Data Fig. 2b). This sug- 
gests that an aberrant interaction with this constitutive locus may drive 
PDGFRA expression in [DH mutant tumours. 

We therefore investigated the topology of the region using 
kilobase-resolution HiC data!>. In all six cell types examined, PDGFRA 
and FIP1L1 reside in distinct domains, separated by one CTCF- 
anchored constitutive boundary (Fig. 3a and Extended Data Fig. 3). 
Our ChIP-seq data confirm that this boundary contains a strong CTCF- 
binding site over a canonical CTCF motif with a CpG dinucleotide 
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FIP1L1 expression (z score) 
Figure 2 | Topological domain boundaries disrupted in IDH mutant 
gliomas. a, Scatterplot depicts significance of deregulated boundaries 
in IDH mutant tumours (y axis) against fold change of most upregulated 
gene in adjacent domains (x axis). PDGFRA is adjacent to a significantly 
deregulated boundary and upregulated in DH mutants. b, Boxplots 
compare PDGFRA expression (left) or copy number (right) for 443 
glioblastoma tumours, classified by IDH status and expression subtype”. 
IDH mutants (red) have increased PDGFRA expression, despite normal 
copy number. c, Plots compare PDGFRA (y axis) and FIP1L1 (x axis) 
expression in IDH wild-type (left) and mutant (right) gliomas. The genes 
correlate specifically in IDH mutants, consistent with deregulation of the 
intervening boundary/insulator. 


FIP1L1 expression (z score) 


in a position previously linked to methylation-sensitivity” (Fig. 3b). 
Quantitative ChIP-PCR reveals that CTCF occupancy at this site is 
reduced between 30% and 50% in IDH mutant tumours and glioma- 
sphere models, relative to wild type (Fig. 3c, d). Moreover, the CpG in 
this motif becomes highly methylated in IDH mutants (Fig. 3e, f). This 
suggests that reduced CTCF binding may compromise the boundary 
flanking PDGFRA in IDH mutant, hypermethylated tumours. 

To identify regulatory elements that might underlie PDGFRA 
induction, we mapped the enhancer-associated histone modification, 
histone H3 lysine 27 acetylation (H3K27ac), in glioma specimens and 
models. We identified a large enhancer ~50 kb upstream of FIP1L1 
with strong acetylation in wild-type and mutant tumours (Fig. 3a and 
Extended Data Fig. 4). In support of an enhancer identity, the ele- 
ment is enriched for H3 lysine 4 mono-methylation (H3K4me1), but 
lacks H3K4me3, and contains conserved motifs bound by the glioma 
master transcription factors OLIG2 and SOX2. Although this enhancer 
is normally insulated from PDGFRA, we reasoned that disruption of 
the intervening boundary might allow it to interact with the oncogene 
in IDH mutant gliomas. To test this, we used chromosome confor- 
mation capture (3C) to query the relative frequencies with which the 
PDGFRA promoter interacts with the FIP1L1 enhancer, with an intra- 
genic PDGFRA enhancer, or with nearby control sites (Fig. 3g). We 
fixed IDH mutant and wild-type glioma specimens and gliomaspheres, 
digested their chromatin with HinDIII, and performed proximity liga- 
tion to re-ligate physically interacting DNA sequences. We used quanti- 
tative PCR (qPCR) to measure ligation frequencies between elements, 
normalizing against control ligations performed with bacterial artificial 
chromosome DNA. 

In wild-type gliomas, 3C revealed a strong interaction between the 
PDGFRA promoter and its intragenic enhancer, which are ~50 kb apart 
(Fig. 3j, k). In contrast, the PDGFRA promoter does not interact with 
the FIP1L1 enhancer in wild-type tumours, consistent with retention of 
the intervening boundary (Fig. 3h, i). However, the interaction patterns 
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Figure 3 | Insulator loss allows PDGFRA to interact with a constitutive 
enhancer. a, Contact domain structure shown for a 1.7-Mb region 
containing PDGFRA. Heat depicts HiC interaction scores between 
triangulated loci in IMR90 cells!°. Domains are visible as triangle-shaped 
regions of high interaction scores. Convergent CTCF sites anchor a loop 
that separates PDGFRA and FIP1L1 (black circle). H3K27ac and CTCF 
profiles are aligned to the contact map. Interaction trace (below) depicts 
HiC signals between the PDGFRA promoter and all other positions in the 
region. Genes, FIP1L1 enhancer (per H3K27ac) and insulator (per HiC 
and CTCF binding) are indicated. b, The right CTCF peak in the insulator 
contains a CTCF motif with a CpG at a methylation-sensitive position. 

c, d, ChIP-qPCR data show that CTCF occupancy over the boundary is 
reduced in IDH mutant (red) gliomas and models, relative to wild type 


were markedly different in JDH mutant tumours. Here, 3C revealed 
a strong interaction between the PDGFRA promoter and the FIPIL1 
enhancer, despite a separation of ~900kb (Fig. 3i). For comparison, 
this interaction is approximately fivefold stronger than that between 
the PDGFRA promoter and its intragenic enhancer. To confirm this 
interaction, we designed and normalized reciprocal probe and primers 
to compare the relative strength with which the FIPIL1 enhancer 
interacts with nearby promoters and PDGFRA (Extended Data 
Fig. 5). Notably, we found that the interaction between FIP1L1 enhancer 
and PDGFRA promoter in DH mutant tumours is stronger than that 
between FIPIL1 enhancer and FIP1L1 promoter. This suggests that 
disruption of a boundary element by DH mutation and hypermethyl- 
ation allows a potent constitutive enhancer to interact aberrantly with, 
and upregulate, PDGFRA. 
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(black). e, Methylation levels of the CpG in the CTCF motif were measured 
in gliomaspheres by bisulfite sequencing, and plotted as a percentage of 
alleles protected from conversion. f, Methylation levels of the CpG in 

the CTCF motif were measured in glioma specimens by methylation- 
sensitive restriction, and plotted as relative protection. g, Expanded views 
of the FIP1L1 enhancer locus and PDGFRA locus shown with H3K27ac 
tracks. Vertical black bars indicate the locations of the common PDGFRA 
promoter primer and four complementary primers tested in 3C. h-k, Plots 
show normalized 3C interaction frequencies between PDGFRA promoter 
and indicated regions. A strong interaction between the PDGFRA 
promoter and the FIP1L1 enhancer is evident in IDH mutant tumours and 
models. ND, none detected. Bars and error bars in all panels reflect mean 
and s.d. of triplicate observations, respectively. 


To test this model functionally, we considered whether perturbing 
the boundary alters PDGFRA expression in patient-derived glioma- 
spheres (Fig. 4a). First, we focused on the IDH1 mutant astrocytoma 
model, BT 142. In this mutant line, the CpG dinucleotide in the CTCF 
motif exhibits higher methylation than wild-type models (~13% versus 
~2% per bisulfite sequencing), and CTCF binding is roughly threefold 
lower. Consistently, 3C reveals a strong interaction between the FIPIL1 
enhancer and the PDGFRA promoter that is specific to the mutant line 
(Fig. 3i), and PDGFRA is highly expressed. 

We reasoned that a demethylating agent should reduce methylation 
at this CpG dinucleotide, allowing CTCF to bind and restore PDGFRA 
insulation. We therefore treated BT 142 gliomaspheres with the DNA 
methyltransferase inhibitor 5-azacytidine (5-aza). 5-aza treatment 
reduced methylation of the CTCF motif by ~2.5-fold, increased CTCF 
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Figure 4 | Boundary methylation and CTCF occupancy affect PDGFRA 
expression and proliferation. a, Schematic depicts chromatin loops and 
boundaries in the PDGFRA locus. In IDH wild-type cells (left), intact 
boundary insulates oncogene. Disruption of the boundary by removing 
the CTCF motif should activate the oncogene. In IDH mutant cells (right), 
hypermethylation blocks CTCF, compromising the boundary and allowing 
enhancer to activate the oncogene. Demethylation should restore CTCF- 
mediated insulation. meCpG, methylated CpG. b, Plot compares CpG 
methylation in the CTCF motif in IDH wild-type gliomaspheres (black), 
IDHI mutant gliomaspheres (red), and IDH1 mutant gliomaspheres treated 
with 51M 5-aza for 8 days (purple). c, Plot compares CTCF occupancy 
over the boundary. DMSO, dimethylsulfoxide; WCE, whole-cell extract. 

d, Plot compares PDGFRA expression. Demethylation restores PDGFRA 
insulation in IDH1 mutant gliomaspheres. e, CTCF binding shown for 

the FIPIL1/PDGFRA region. Expanded view shows CTCF motif in the 


occupancy by ~1.7-fold and downregulated PDGFRA expression by 
~5-fold (Fig. 4b-d). These results directly implicate DNA hyper- 
methylation in compromising CTCF binding, boundary function and 
oncogene insulation in [DH mutant tumours. 

Finally, we investigated whether genetic disruption of the CTCF 
motif could induce PDGFRA expression in wild-type gliomaspheres 
with an intact boundary (Fig. 4a). Here we focused on GSC6, a 
patient-derived glioblastoma model that contains an EGFR amplifi- 
cation, but is wild type for IDH1, IDH2 and PDGFRA. We sought to 
disrupt the CTCF site in the boundary by CRISPR (clustered regularly 
interspaced short palindromic repeats)-based genome engineering”*”’ 
(Fig. 4e). We designed a short guide RNA (sgRNA) with a protospacer 
adjacent motif (PAM) within the CTCF motif. A single-vector lentiviral 
delivery system was used to infect GSC6 cells with a Cas9 expression 
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Relative cell number (RLUs) *®* 


Days 
insulator targeted for CRISPR-based deletion. ssRNA and PAM direct 
Cas9 nuclease to the motif. f, Surveyor assay detects target site alterations 
in GSC6 gliomaspheres infected with Cas9 and sgRNA (but not in control 
cells infected with GFP-targeting sgRNA). g, Sequencing of target site 
reveals the indicated deletions. CTCF motif disrupted on ~25% of alleles 
(compare to <0.01% in control). h, Plot depicts fraction of reads in insulator 
CRISPR cells with a deletion of indicated size. i, qPCR reveals increased 
PDGFRA expression in insulator CRISPR cells. j, Flow cytometry reveals 
~2-fold greater PDGFRa in insulator CRISPR cells. PE, phycoerythrin. 
k, Plot depicts gliomasphere growth. Insulator CRISPR cells exhibit an 
approximately twofold increased proliferation, relative to control. This 
proliferation advantage is eliminated by PDGFRa inhibition. RLUs, relative 
light units. These results indicate that genetic or epigenetic disruption of the 
boundary compromises insulation of this oncogene. Bars and error bars in 
all panels reflect mean and s.d. of triplicate observations, respectively. 


construct containing this insulator sgRNA or a control sgRNA (target- 
ing green fluorescent protein, GFP). Surveyor assay confirmed target 
locus disruption in the insulator CRISPR condition (Fig. 4f). Direct 
sequencing of the target locus revealed that ~25% of alleles in the insu- 
lator CRISPR gliomaspheres contain a deletion within the CTCF motif 
expected to disrupt binding, compared to <0.1% in the GFP control 
(Fig. 4g, h). 

We quantified PDGFRA expression in the genetically modified gli- 
omaspheres. Reverse transcription PCR (RT-PCR) revealed an ~1.6- 
fold increase in PDGFRA messenger RNA in the insulator CRISPR 
cells, relative to control (Fig. 4i). Similarly, flow cytometry revealed 
an ~1.8-fold increase in the fraction of cells with PDGFRa surface 
expression (Fig. 4j). We conservatively estimate that CTCF motif dis- 
ruption causes an ~3-fold increase in PDGFRA expression, given that 
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DNA level analysis indicates that less than 50% of insulator CRISPR 
cells were successfully edited. 

Finally, we considered whether CRISPR-mediated boundary disrup- 
tion and PDGFRA induction affects gliomasphere fitness. In support, 
the insulator CRISPR gliomaspheres have an approximately two- 
fold growth advantage over the control GFP CRISPR gliomaspheres 
(Fig. 4k). This growth advantage is dependent on PDGFRa signalling, 
as it is abrogated by treatment with the PDGFR inhibitors dasatinib 
or crenolanib (Fig. 4k and Extended Data Fig. 6). Notably, PDGFRA 
expression in insulator CRISPR gliomaspheres increased further after 
extended culture (twofold increase compared with control), potentially 
owing to selection of effectively edited clones. The observation that 
genetic disruption of this CTCF boundary element induces PDGFRA 
expression and enhances proliferation provides strong support for our 
model that epigenetic disruption of this element offers similar growth 
advantage to DH mutant gliomas. 

In conclusion, we present a new epigenetic mechanism by which 
gain-of-function IDH mutations induce PDGFRA expression and 
thereby promote fitness in a subset of gliomas. We specifically find 
that, in addition to familiar effects on CpG islands, [DH mutations 
cause hypermethylation of CTCF binding sites genome-wide. This is 
associated with reduced CTCF binding and a global deregulation of 
boundary elements that partition topological domains. Disruption of 
a specific boundary bordering PDGFRA allows a potent enhancer to 
contact and activate this canonical glioma oncogene aberrantly. 

Although disruption of this single boundary confers a growth advan- 
tage, it is unlikely to be the only mediator of IDH mutations in gliomas. 
The widespread disruption of CTCF binding and boundary element 
function could provide many opportunities for oncogene deregula- 
tion, and subsequent selection of proliferative progeny that inherit the 
altered epigenetic state. Insulator dysfunction may also be accompanied 
by promoter silencing events”*”®, and by alterations to other pathways 
affected by 2-hydroxyglutarate”*°. Conversely, disruption of chro- 
mosomal topology and oncogene insulation may be more generally 
relevant to methylator phenotypes observed in colorectal and renal cell 
carcinomas, leukaemia and other malignancies”®. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 6 July; accepted 26 November 2015. 
Published online 23 December 2015. 


1. Parsons, D. W. et a/. An integrated genomic analysis of human glioblastoma 
multiforme. Science 321, 1807-1812 (2008). 

2. The Cancer Genome Atlas Research Network. Comprehensive, integrative 
genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 
2481-2498 (2015). 

3. Dang, L. et al. Cancer-associated IDH1 mutations produce 2-hydroxyglutarate. 
Nature 462, 739-744 (2009). 

4. Figueroa, M. E. et al. Leukemic IDH1 and IDH2 mutations result in a 
hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic 
differentiation. Cancer Cell 18, 553-567 (2010). 

5. Xu, W. et al. Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of 
a-ketoglutarate-dependent dioxygenases. Cancer Cell 19, 17-30 (2011). 

6. Lu, C. eta/. IDH mutation impairs histone demethylation and results in a block 
to cell differentiation. Nature 483, 474-478 (2012). 

7. Cairns, R. A. & Mak, T. W. Oncogenic isocitrate dehydrogenase mutations: 
mechanisms, models, and clinical opportunities. Cancer Dis. 3, 730-741 (2013). 

8. Pastor, W. A., Aravind, L. & Rao, A. TETonic shift: biological roles of TET proteins 
in DNA demethylation and transcription. Nature Rev. Mol. Cell Biol. 14, 
341-356 (2013). 


114 | NATURE | VOL 529 | 7 JANUARY 2016 


9. Kohli, R. M. & Zhang, Y. TET enzymes, TDG and the dynamics of DNA 
demethylation. Nature 502, 472-479 (2013). 

10. Noushmehr, H. et al. Identification of a CpG island methylator phenotype that 

defines a distinct subgroup of glioma. Cancer Cell 17, 510-522 (2010). 

11. Turcan, S. et a/. IDH1 mutation is sufficient to establish the glioma 

hypermethylator phenotype. Nature 483, 479-483 (2012). 

12. Bickmore, W. A. & van Steensel, B. Genome architecture: domain organization 

of interphase chromosomes. Cel! 152, 1270-1284 (2013). 

13. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions 

reveals folding principles of the human genome. Science 326, 289-293 

(2009). 

14. Dixon, J. R. et al. Topological domains in mammalian genomes identified by 

analysis of chromatin interactions. Nature 485, 376-380 (2012). 

15. Rao, S. S. et al. A3D map of the human genome at kilobase resolution reveals 

principles of chromatin looping. Cel/ 159, 1665-1680 (2014). 

16. Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the 

X-inactivation centre. Nature 485, 381-385 (2012). 

17. Lupiafiez, D. G. et al. Disruptions of topological chromatin domains cause 

pathogenic rewiring of gene-enhancer interactions. Cel! 161, 1012-1025 

(2015). 

18. Bell, A. C. & Felsenfeld, G. Methylation of a CTCF-dependent boundary controls 
imprinted expression of the /gf2 gene. Nature 405, 482-485 (2000). 

19. Hark, A. T. et al. CTCF mediates methylation-sensitive enhancer-blocking 
activity at the H19/lgf2 locus. Nature 405, 486-489 (2000). 

20. The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: 
multitissue gene regulation in humans. Science 348, 648-660 (2015). 

21. Zuin, J. et al. Cohesin and CTCF differentially affect chromatin architecture and 
gene expression in human cells. Proc. Nat! Acad. Sci. USA 111, 996-1001 
(2014). 

22. Sturm, D. et al. Paediatric and adult glioblastoma: multiform (epi)genomic 
culprits emerge. Nature Rev. Cancer 14, 92-107 (2014). 

23. Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cel/ 155, 
462-477 (2013). 

24. Verhaak, R. G. et al. Integrated genomic analysis identifies clinically relevant 
subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, 
EGFR, and NF1. Cancer Cell 17, 98-110 (2010). 

25. Wang, H. et al. Widespread plasticity in CTCF occupancy linked to DNA 
methylation. Genome Res. 22, 1680-1688 (2012). 

26. Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of 
CRISPR-Cas9 for genome engineering. Cel/ 157, 1262-1278 (2014). 

27. Sander, J. D. & Joung, J. K. CRISPR-Cas systems for editing, regulating and 
targeting genomes. Nature Biotechnol. 32, 347-355 (2014). 

28. Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome — 
biological and translational implications. Nature Rev. Cancer 11, 726-734 
(2011). 

29. Costello, J. F., Berger, M. S., Huang, H. S. & Cavenee, W. K. Silencing of 
p16/CDKN2 expression in human gliomas by methylation and chromatin 
condensation. Cancer Res. 56, 2405-2410 (1996). 

30. Koivunen, P. et al. Transformation by the (R)-enantiomer of 2-hydroxyglutarate 
linked to EGLN activation. Nature 483, 484-488 (2012). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank J. Kim, the MGH Neuro Oncology Tissue 
Repository, and the MGH Pathology Flow Cytometry Core for assistance with 
clinical samples and analysis, and E. Lander and W. Kaelin for discussions. 
W.A.F. is supported by a basic research fellowship from the American Brain 
Tumor Association. B.B.L. is supported by a Jane Coffin Childs fellowship. B.E.B. 
is an American Cancer Society Research Professor. This research was supported 
by funds from Howard Hughes Medical Institute, the National Brain Tumor 
Society and the National Human Genome Research Institute. 


Author Contributions Conception and experimental design: W.A.F, Y.D., B.B.L., 
S.M.G., M.L.S. and B.E.B. Methodology and data acquisition: W.A.F., Y.D., B.B.L., 
S.M.G., A.S.V., A.0.S.-R., M.L.S. and B.E.B. Analysis and interpretation of data: 
W.A.F., Y.D. and B.E.B. Manuscript writing: W.A.F., Y.D. and B.E.B. W.A.F. and Y.D. 
contributed equally to this work. 


Author Information Data generated for this study are available through the 
Gene Expression Omnibus (GEO) under accession number GSE70991. Reprints 
and permissions information is available at www.nature.com/reprints. The 
authors declare no competing financial interests. Readers are welcome to 
comment on the online version of the paper. Correspondence and requests for 
materials should be addressed to B.E.B. (bernstein.bradley@mgh.harvard.edu). 


© 2016 Macmillan Publishers Limited. All rights reserved 


METHODS 


No statistical methods were used to predetermine sample size. 

Primary glioma specimens and gliomasphere models. Clinical samples GBM1w, 
GBM2w, GBM3w, GBM4w, GBM5w, GBM6w, GBM7w, AA15m, AA16m, 
AA17m, OD18 m and AA19 m were obtained as frozen specimens from the 
Massachusetts General Hospital Pathology Tissue Bank, or received directly after 
surgical resection and flash frozen (Extended Data Table 1). All samples were 
acquired with Institutional Review Board approval, and were de-identified before 
receipt. GBM1w was obtained at autopsy; the remaining samples were surgical 
resections. [DH status was determined for all clinical samples by SNaPshot multi- 
plex PCR*!. PDGFRA status was confirmed by FISH analysis. Tissue (200-500 1g) 
was mechanically minced with a sterile razor blade before further processing. 

Gliomaspheres were maintained in culture as described***. In brief, neurosphere 

cultures contain Neurobasal media supplemented with 20ng ml“! recombinant 
EGF (Rand D Systems), 20ng ml! FGF2 (Rand D Systems), 1x B27 supplement 
(Invitrogen), 0.5 N2 supplement (Invitrogen), 3mM t-glutamine, and penicillin/ 
streptomycin. Cultures were confirmed to be mycoplasma-free via PCR methods. 
GSC4 and GSC6 gliomasphere lines were derived from IDH wild-type tumours 
resected at Massachusetts General Hospital, and have been previously described 
and characterized*?**, BT 142 gliomasphere line (IDH1 mutant)*? was obtained 
from ATCC, and cultured as described above except 25% conditioned media was 
carried over each passage. BT 142 G-CIMP status was confirmed by evaluating LINE 
methylation with the Global DNA Methylation Assay - LINE-1 kit (Active Motif), 
as described*®, and by methylation-sensitive restriction digests. GSC119 was derived 
from an [DH1 mutant tumour (confirmed by SNaPshot) resected at Massachusetts 
General Hospital. We confirmed IDH1 mutant status of GSC119 by RNA-seq 
(82 out of 148 reads overlapping the relevant position in the transcript correspond 
the mutant allele). The gliomasphere models were derived from tumours of the 
following types: GSC4 and GSC¢6: primary glioblastoma; BT 142: grade III oligo- 
astrocytoma; GSC119: secondary glioblastoma, G-CIMP. Clinical specimens and 
models used in this study are detailed in Extended Data Table 1. 
ChIP. ChIP-seq was performed as described previously™. In brief, cultured cells 
or minced tissue was fixed in 1% formaldehyde and snap frozen in liquid nitrogen 
and stored at —80°C at least overnight. Sonication of tumour specimens and glio- 
maspheres was calibrated such that DNA was sheared to between 400 and 2,000 bp. 
CTCF was immunoprecipitated with a monoclonal rabbit CTCF antibody, clone 
D31H2 (Cell Signaling 3418). H3K27ac was immunoprecipitated with an antibody 
from Active Motif (39133). ChIP DNA was used to generate sequencing libraries 
by end repair (End-It DNA repair kit, Epicentre), 3’ A base overhang addition via 
Klenow fragment (NEB), and ligation of barcoded sequencing adapters. Barcoded 
fragments were amplified via PCR. Libraries were sequenced as 38-base paired-end 
reads on an Illumina NextSeq500 instrument or as 50-base single-end reads on 
a MiSeq instrument. Sequencing libraries are detailed in Extended Data Table 2. 
H3K27ac maps for GSC6 were previously deposited to the GEO under accession 
GSM1306340. Genomic data has been deposited into GEO as GSE70991. 

For sequence analysis, identical reads were collapsed to a single paired-end read 
to avoid PCR duplicates. To avoid possible saturation, reads were downsampled to 
5% reads collapsed as PCR duplicates, or 5 million fragments. Reads were aligned 
to hg19 using BWA, and peaks were called using HOMER. ChIP-seq tracks were 
visualized using Integrative Genomics Viewer (IGV, http://www.broadinstitute. 
org/igv/). To detect peaks lost in IDH mutants, we called signal over all peaks ina 
100-bp window centred on the peaks. To control for copy number changes, we first 
called copy number profiles from input sequencing data using CNVnator*”. We 
then removed all regions where at least one sample had a strong deletion (<0.25), 
and normalized by copy number. To account for batch effects and difference in 
ChIP efficiency, we quantile normalized each data set. Peaks were scored as lost or 
gained if the difference in signal between a given tumour and the average of the five 
wild-type tumours was at least twofold lower or higher, with a signal of at least 1 in 
all wild-type or IDH mutant tumours. Fisher exact test confirmed that the overlap 
between peaks lost in the DH mutant tumours is highly significant (P< 10~'). 

GC content over CTCF peaks lost (or retained) in the [DH mutant glioma 
specimens was averaged over 200-bp windows centred on each peak lost in IDH 
mutant tumours. Methylation levels were quantified over these same regions for 
3 IDH mutant and 3 IDH wild-type tumours, using TCGA data generated by whole 
genome bisulfite sequencing”. In brief, methylation levels (percentage) based on 
proportion of reads with protected CpG were averaged over all CpG di-nucleotides 
in these regions, treating each tumour separately. 

Occupancy of the CTCF site in the boundary element adjacent to the 
PDGFRA locus was quantified by ChIP qPCR, using the following prim- 
ers: PDGFRActcfF: 5’-GTCACAGTAGAACCACAGAT-3’; PDGFRActcfR: 
5/-TAAGTATACTGGTCCTCCTC-3’. Equal masses of ChIP or input (WCE) 
DNA were used as input for PCR, and CTCF occupancy was quantified as a ratio 
between ChIP and WCE, determined by 2-4“. CTCF peak intensity was further 
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normalized as ratio to two invariant peaks, at PSMBI1 and SPG11, using the fol- 
lowing primers: PSMB IctcfF: 5/-CCTTCCTAGTCACTCAGTAA-3’; PSMBIctcfR: 
5'-CAGTGTTGACTCATCCAG-3’; SPG1 IctcfF: 5’-CAGTACCAGCCTC 
TCTAG-3’; SPG1 IctcfR: 5’-CTAAGCTAGGCCTTCAAG-3’. 

Cross-boundary and intra-domain gene pair correlation analysis. RNA-seq data 
for 357 normal brain samples was downloaded from GTEx”’. RNA-seq data and 
copy number profiles for lower grade gliomas were downloaded from TCGA”>4. 
Contact domains of IMR90, GM12878, K562 and NHEK cells were obtained from 
published HiC data!>. Genes were assigned to the inner-most domain in which 
their transcription start site fell within. Gene pairs were considered to be in the 
same domain if they were assigned to the same domain in both GM12878 and 
IMR90. Gene pairs were considered to span a boundary if they were assigned to 
different domains in both GM12878 and IMR90, and separated by a CTCF-binding 
site in IDH wild-type tumours. Gene pairs that did not fit either criterion were 
excluded from this analysis. The plot of correlation vs distance for brain GTEx 
samples is based on Pearson correlations for all relevant pairs, smoothed by locally 
weighted scatterplot smoothing with weighted linear least squares (LOESS). To 
assess the bias in correlation differences, we computed the difference of Pearson 
correlations between wild-type and IDH mutant gliomas for all gene pairs sepa- 
rated by <180kb. In Fig. le, this difference in correlations is plotted against the sig- 
nificance of this difference (estimated by Fisher z-transformation). For each gene 
pair, we omitted samples with a deletion or amplification of one of the genes at or 
above threshold of the minimal arm level deletion or amplification (to avoid copy 
number bias). To ensure robustness, we also repeated the analysis using boundaries 
defined from HiC data for K562 and NHEK. This yielded similar results: 84% pairs 
gaining correlation cross boundary versus 71% expected (P< 8 x 10° *), 54% pairs 
losing correlation are within the same domain versus 29% expected (P<3 x 10-°), 
Repeating the analysis with only the 14,055 genes that have expressed over 1 tran- 
scripts per million (TPM) in at least half the samples also yielded similar results 
(Extended Data Fig. 7): 92% pairs gaining correlation cross boundary versus 69% 
expected (P< 2 x 10° %), 73% pairs losing correlation are within the same domain 
versus 31% expected (P< 8 x 10°“). 

Genomic scan for deregulated boundaries. To detect boundaries deregulated 
in IDH mutant gliomas, we scanned for gene pairs, separated by <1 Mb, with a 
significant difference in correlation between wild-type and [DH mutant tumours 
(Fisher z-transformation, FDR <1%). We omitted amplified or deleted samples 
as described above. To ensure robustness to noise from lowly expressed genes, 
we first filtered out 6,476 genes expressed <1 TPM in more than half of the sam- 
ples (keeping 14,055 genes). We considered all domains and boundaries scored in 
IMR90 HiC data!’ Gene pairs crossing a CTCF peak and an IMR90 boundary (that 
is, can be assigned to different domains) that were significantly more correlated 
in IDH mutant tumours were considered to support the loss of that boundary. 
Gene pairs not crossing a boundary (that is, can be assigned to the same domain) 
that were significantly less correlated in IDH mutant tumours were considered to 
support the loss of a flanking boundary. We collated a set of deregulated bound- 
aries, supported by at least one cross-boundary pair gaining correlation and at 
least one intra-domain pair losing correlation. Each was assigned a P value equal 
to the product of both supporting pairs (best P value was chosen if there were 
more supporting pairs). If both boundaries of a domain were deregulated, or if the 
same pair of gene pairs (one losing and one gaining correlations) were supporting 
more than one boundary due to overlapping domains, the entries were merged 
(Supplementary Table 1). This definition allows every gene pair to be considered 
as potential support for a boundary loss. To quantify CTCF occupancy over these 
deregulated boundaries, we averaged the signal over all CTCF peaks located within 
a 1-kb window around the boundary, using copy number and quantile normalized 
CTCE signals. To quantify DNA methylation over the deregulated boundaries, 
we averaged DNA methylation signals from TCGA data in 200-bp windows as 
above. Figure 2a depicts significance of disrupted domains and the fold change of 
genes in them that are upregulated in DH mutant tumours (compared to median 
expression in wild type). In addition to PDGFRA, top-ranking genes include CHD4 
(P< 10~*), a driver of glioblastoma tumour initiation*®, LICAM (P< 10-8), a 
regulator of the glioma stem cells and tumour growth”, and other candidate reg- 
ulators (Supplementary Table 1). 

To ensure robustness to cell-type-specific boundaries, we repeated the analysis 
with GM12878-, K562- and NHEK-defined boundaries. This yielded very similar 
results, and again highlighted PDGFRA as an overexpressed gene adjacent to a 
disrupted boundary. 

TCGA correlation and outcome analysis. For the correlation of FIP1L1 and 
PDGERA expression, RNA-seq data from the TCGA lower grade glioma (LGG) 
and glioblastoma (GBM) data sets”** were downloaded and segregated by IDH 
mutation status and subtype. Patients from the proneural subtype were divided 
by IDH mutation status, while patients from the mesenchymal, classical or neural 
subtypes (which had no IDH mutations) were classified as ‘other’. For correlation 
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analysis, patients with copy number variation in either gene were excluded from 
the analysis to control for effects of co-amplification. For outcome analysis, LGG 
RNA-seq data and corresponding patient survival data was obtained from TCGA. 
Patients with sum PDGFRA and FIP1L1 expression of at least one-half of one 
standard deviation above the mean were classified as ‘high PDGFRA and FIP1L1 
expression (= 17), while all other patients were classified as ‘low PDGFRA and 
FIP1L1 expression (n= 201). Data were plotted as Kaplan-Meier curves and sta- 
tistically analysed via log-rank test. 

HiC data analysis and visualization. HiC data'> were downloaded from GEO. 
5-kb resolution intra-chromosomal contact scores for chromosome 4 for the cell 
lines IMR90, NHEK, KBM7, K562, HUVEC, HMEC and GM12878 were filtered 
to the region between 53,700 and 55,400 kb. The average interaction score at each 
coordinate pair for all cell lines was calculated and used to determine putative insu- 
lator elements as local maxima at the interaction point of two domain boundaries. 
To determine the interactions of the PDGFRA promoter, the interaction scores 
of all points in the region with the PDGFRA promoter (chr4: 55,090,000) were 
plotted as a one-dimensional trace. To view the topological domain structure of 
the region, HiC interaction scores were visualized using Juicebox (http://www. 
aidenlab.org/juicebox/)'°. Data shown is from the IMR90 cell line at 5-kb resolu- 
tion, normalized to coverage. 

DNA methylation quantification. DNA methylation was analysed in 
two ways. For gliomaspheres, genomic DNA was isolated via QiaAmp 
DNA minikit (Qiagen) and subjected to bisulfite conversion (EZ DNA 
Methylation Gold Kit, Zymo Research). Bisulfite-converted DNA spe- 
cific to the CTCF-binding site (defined by JASPAR*”) in the boundary 
adjacent to PDGFRA was amplified using the following primers forward: 
5'-GAATTATAGATAATGTAGTTAGATGG-3’, reverse: 5/-AAATATACTA 
ATCCTCCTCTCCCAAA-3’. Amplified DNA was used to prepare a sequencing 
library, which was sequenced as 38-base paired-end reads on a NextSeq500. 
For tumours, limiting DNA yields required an alternative strategy for 
methylation analysis. Tumour genomic DNA was isolated from minced fro- 
zen sections of tumours by QiaAmp DNA minikit (Qiagen). Genomic DNA 
was digested using the methylation-sensitive restriction enzyme Hin6I 
(Thermo) recognizing the restriction site GCGC, or subjected to mock 
digestion. Protected DNA was quantified by PCR using the following primer 
set: PDGFRAinsF: 5’-CGTGAGCTGAATTGTGCCTG-3’, PDGFRAinsR: 
5'-TGGGAGGACAGTTTAGGGCT-3’, normalizing to mock digestion. 

3C analysis. 3C analysis was performed using procedures as described previ- 
ously*!?, In brief, ~10 million cell equivalents from minced tumour specimens or 
gliomasphere cultures were fixed in 1% formaldehyde. Fixed samples were lysed in 
lysis buffer containing 0.2% PMSF using a Dounce pestle. Following lysis, samples 
were digested with HinDIII (NEB) overnight on a thermomixer at 37°C rotating at 
950r.p.m. Diluted samples were ligated using T4 DNA ligase (NEB) at 16°C over- 
night, followed by RNase and proteinase K treatment. DNA was extracted via phe- 
nol/chloroform/isoamyl alcohol (Invitrogen). DNA was analysed via TaqMan PCR 
using ABI master mix. Primers and probe were synthesized by IDT with the follow- 
ing sequences: common PDGFRA promoter: 5’/-GGTCGTGCCTTTGTTTT-3’; 
FIP1L1 control: 5’/-CAGGGAAGAGAGGAAGTTT-3’; FIPIL1 enhancer: 
5'-TTAAGTAAGCAGGTAAACTACAT-3’; intragenic enhancer: 5/-AGCC 
TTTGCCTCCTTTT-3'; intragenic control: 5’-CCACAGGGAGAAGGAAAT-3/; 
intact promoter: 5'-CAAGGAATTCGTAGGGTTC-3’; probe: 5/-/56-FAM/ 
TTGTATGCG/ZEN/AGATAGAAGCCAGGGCAA/3IABKFQ/-3’. For the 
reciprocal FIP1L1 enhancer interaction interrogation, the following primer 
sequences were used: common enhancer primer (as FIP1L1 enhancer primer 
above): 5’-TTAAGTAAGCAGGTAAACTACAT-3’, PDGFRA promoter 
(as common PDGFRA promoter above): 5’-GGTCGTGCCTTTGTTTT-3’; 
SCFD2 promoter: 5’-AATACATGGTCATGATGCTC-3’; FIPIL1 promoter: 
5’-AGGCATTGCTTAAACATAAC-3’; FIPIL1 control: 5'/-TTATTTGTAGT 
AGAGGTTACTGG-3’; PDGFRA control: 5‘-ATGATAACACCACCATTCAG-3’; 
FIP1L1 enhancer probe: 5’/-/56-FAM/TATCCCAAC/ZEN/CAAATACAGGG 
CTTGG/3IABkFQ/-3’. To normalize primer signals, bacterial artificial chro- 
mosome (BAC) clones CTD-2022B5 and RP11-626H4 were obtained from 
Invitrogen. BAC DNA was purified via BACMAX DNA Purification kit (Epicentre) 
and quantified using two primer sets specific to the Chloramphenicol resist- 
ance gene: 1F: 5/-TTCGTCTCAGCCAATCCCTG-3’; 1R: 5’-TTTGCCCATG 
GTGAAAACGG-3’; 2F: GGTTCATCATGCCGTTTGTG-3’; 2R: 5’-CCACTCAT 
CGCAGTACTGTTG-3’. BAC DNA was subjected to a similar 3C protocol, omit- 
ting steps related to cell lysis, proteinase or RNase treatment. PCR signal from 
tumour and gliomasphere 3C was normalized to digestion efficiency and BAC 
primer signal. 

Treatment with demethylating agent. BT 142 cells were cultured in either 5 4M 
5-azacytidine or equivalent DMSO (1:10,000) for 8 days, with drug refreshed 
every 2 days. 


CRISPR/Cas9 insulator disruption. The following CRISPR sgRNAs 
were cloned into the LentiCRISPR vector obtained from the Zhang labora- 
tory’*: GFP: 5’/-GAGCTGGACGGCGACGTAAA-3’; insulator: 5/-GCCACA 
GATAATGCAGCTAGA-3’. GSC6 gliomaspheres were mechanically disso- 
ciated and plated in 51g ml~’ EHS laminin (Sigma) and allowed to adhere 
overnight, and then infected with lentivirus containing either CRISPR vec- 
tor for 48 h. Cells were then selected in 1g ml~! puromycin for 4 days, with 
puromycin-containing media refreshed every 2 days. Genomic DNA was iso- 
lated and the region of interest was amplified using the PDGFRAins primer 
set described above. CRISPR-mediated disruption of this amplified DNA was 
confirmed via Surveyor Assay (Transgenomic), with amplified uninfected 
GSC6 genomic DNA being added to each annealing reaction as the unmod- 
ified control. To quantify the precise CRISPR alterations, genomic DNA from 
each construct was amplified using a set of primers closer to the putative dele- 
tion site as follows: forward: 5’-TTTGCAATGGGACACGGAGA-3’, reverse: 
5/-AGAAATGTGTGGATGTGAGCG-3’. PCR product from these primers was 
used to prepare a library that was sequenced as 38-base paired-end reads on the 
Illumina NextSeq500. 

PDGFRA qPCR. Total RNA was isolated from CRISPR-infected GSC6 
gliomaspheres (insulator or control GFP sgRNA) or BT 142 gliomaspheres 
(5-aza-treated or control condition) using the RNeasy minikit (Qiagen) and 
used to synthesize cDNA with the SuperScriptIII system (Invitrogen). CDNA 
was analysed using SYBR mastermix (Applied Biosystems) on a 7500 Fast 
Real Time System (Applied Biosystems). PDGFRA expression was determined 
using the following primers: forward: 5’-GCTCAGCCCTGTGAGAAGAC-3/, 
reverse: 5‘-ATTGCGGAATAACATCGGAG-3’, and was normalized to 
primers for ribosomal protein, large, PO (RPLPO), as follows: forward: 
5'-TCCCACTTGCTGAAAAGGTCA-3/, reverse: 5‘-CCGACTCTTCCTTG 
GCTTCA-3’. Normalization was also verified by $-actin (ACTB), forward: 
5’-AGAAAATCTGGCACCACACC-3’, reverse: 5’-AGAGGCGTACAGG 
GATAGCA-3’. 

PDGFRa flow cytometry. Cells were incubated with PE-conjugated anti- 
PDGFRa (CD140a) antibody (Biolegend, clone 16A1) for 30 min at room temper- 
ature at the dilution specified in the manufacturer’s protocol. Data was analysed 
and visualized with FlowJo software. Single live cells were selected for analysis 
via side and forward scatter, and viable cells were selected by lack of an unstained 
channel (APC) autofluorescence. 

Cell growth assay. For the cell growth assay, 2,500 dissociated viable GSC6 cells 
expressing CRISPR and either GFP or insulator-targeting sgRNA (see above) was 
plated in 10011 of media in an opaque-walled tissue culture 96-well plate, in 11M 
dasatinib, 500nM crenolanib, or equivalent DMSO (1:10,000) as a vehicle control. 
Cell growth was analysed at days 3, 5 and 7 for dasatinib, or days 3, 7 and 10 for 
crenolanib, using CellTiter-Glo reagent (Promega) following the manufacturer's 
protocol. Data were normalized across days using an ATP standard curve. 
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Extended Data Figure 1 | DNA methylation and CTCF binding at considered. Methylation levels were determined from whole-genome 
deregulated boundaries. a, Box plots show DNA methylation levels over bisulfite data for three [DH mutant (red labels) and three IDH wild-type 
CTCE sites (200-bp window centred on the peak) within boundaries (black labels) tumours. b, Bars show average normalized ChIP-seq signal 
predicted by gene pair correlation analysis to be disrupted. All CTCF over all CTCF sites located inside a 1-kb window centred on a disrupted 
sites located within a 1-kb window centred on a disrupted boundary were boundary. 
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Extended Data Figure 2 | Expression of Fip111 in mouse brain cells and expression is a negative prognostic factor in IDH1 mutant lower-grade 
survival effects of PDGFRA and FIP1L1. a, Expression of FipiI1 ina gliomas. Multivariate analysis including the known prognostic factor 
published data set for isolated mouse brain cell types**. b, Kaplan-Meier 1p/19q deletion diminished this effect into non-significance, suggesting 


plot based on TCGA data’? indicates that combined FIPILI and PDGFRA that other predictors of survival may also have a role in this model. 
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Extended Data Figure 3 | CTCF-anchored loop in the PDGFRA 

region. a, Schematic depiction of a HiC interaction signature of a 
CTCF-anchored loop domain, compared to an ordinary domain, as 
described previously!°. CTCF-anchored loop domains are characterized 
by an increased interaction score at the apex of the domain, representing 

a CTCF-CTCF dimeric interaction. b, IMR90 HiC contact matrix for the 
PDGFRA/FIPI1L1 locus, as presented in Fig. 3a. Solid circle indicates CTCF 


opirea airy 


dimer interaction point; dashed circles indicate lack of CTCF dimeric 
anchor signature. c, IMR90 HiC contact matrix as in b, but with an 
expanded heatmap scale, more clearly conveys the CTCF-anchored loop 
that insulates PDGFRA. d, e, HiC contact matrix for GM12878 cells for the 
same region confirms a single CTCF-anchored loop (solid circle) between 
PDGFRA and FIP1L1. These data support the significance of this specific 
boundary in locus topology and PDGFRA insulation. 
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Extended Data Figure 4 | Characterization of the FIP1L1 enhancer. 

a, H3K27ac ChIP-seq track for GSC6 gliomaspheres reveals strong 
enrichment over the FIP1L1 enhancer. CTCF ChIP-seq track reveals 
location of the boundary element insulator (as in Fig. 3a). FIPIL1 
enhancer (i) and promoter (ii) are indicated. b, H3K27ac ChIP-seq tracks 
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enrichment over the FIP1L1 enhancer. c, ChIP-seq tracks for glioma 
master transcription factors and other histone modifications support the 
enhancer identity of the element (H3K27ac, H3K4mel, SOX2, OLIG2; 
lacks H3K4me3, lacks H3K27me3). By contrast, the FIPIL1 promoter has 
a distinct ‘promoter-like’ chromatin state. 
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Extended Data Figure 5 | Interaction of the FIP1L1 enhancer with wild-type and mutant tumours and models. In IDH wild-type gliomas, 
nearby promoters and PDGFRA quantified by reciprocal 3C. Top, the it shows essentially no interaction with the PDGFRA promoter. In IDH 
H3K27ac, CTCF and genetic architecture of the FIP1L1/PDGFRA locus mutant gliomas, it interacts with the PDGFRA promoter with comparable 
is indicated, highlighting the 3C strategy. Bottom, plots indicate the strength to the local interactions, despite the much larger intervening 
interaction signal of the indicated sites (black lines) with the common distance (900 kb). Error bars reflect s.d. 


enhancer primer. The FIP1L1 enhancer interacts with local promoters in 
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GSC6 Growth in PDGFRA Inhibition 
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Extended Data Figure 6 | Crenolanib reverses the increased growth 

of PDGFRA insulator disrupted cells. Insulator CRISPR-infected 
gliomaspheres exhibit a roughly twofold increase in proliferation rate, 
compared to control sgRNA-infected gliomaspheres. This proliferative 
advantage is eliminated by treatment with the PDGFRa inhibitor 
crenolanib. Crenolanib and dasatinib both inhibit PDGFRa, but their 
other targets are non-overlapping. Hence, this sensitivity provides further 
support that PDGFRA induction drives the increased proliferation of the 
insulator CRISPR gliomaspheres. Error bars reflect s.d. 
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Extended Data Figure 7 | Signature of boundary deregulation in IDH 
mutant gliomas is robust. Volcano plot depicts the significance (y axis) of 
gene pairs that are either more or less correlated in JDH mutant than IDH 
wild-type gliomas. This plot was generated by repeating the analysis in the 
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main text and shown in Fig. Lf, except that here the statistics were performed 
using only the 14,055 genes expressed at >1 TPM in at least half of the 
samples. This indicates that the boundary deregulation signature in IDH 
mutant gliomas is not sensitive to noise from lowly expressed genes. 
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Extended Data Table 1 | Clinical specimens and tumour models 


Glioma 


GBM1w 
GBM2w 
GBM3w 
GBM4w 
GBM5w 
GBM6w 
GBM7w 
AA15m 
AA16m 
AA17m 
OD18m 
AA19m 
GSC4 
GSC6 
BT142 
GSsc119 


Clinical information for glioma specimens and gliomasphere models is shown. 


Tissue Type 


Autopsy Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Surgical Specimen 
Gliomasphere 
Gliomasphere 
Gliomasphere 
Gliomasphere 


Tissue Source 


Banked 
Banked 
Banked 
Banked 
Fresh 
Fresh 
Fresh 
Banked 
Banked 
Fresh 
Fresh 
Fresh 


Source 


MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
MGH 
ATCC 
MGH 


IDH1 Status 


Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
R132H 
R132H 
R132H 
R132H 
R132H 
Wild Type 
Wild Type 
R132H 
R132H 


PDGFRA Status 


Amplified 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 
Wild Type 


1p/19q Status 


Not tested 
Not tested 
Not tested 
Not tested 
Not tested 
Not tested 
Not tested 
Intact 
Intact 
Intact 
Lost 
Intact 
Intact 
Intact 
Intact 
Intact 
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Grade 


Disease 


Glioblastoma 
Glioblastoma 
Glioblastoma 
Glioblastoma 
Glioblastoma 
Glioblastoma 
Glioblastoma 
Anaplastic Astrocytoma 
Anaplastic Astrocytoma 
Anaplastic Astrocytoma 
Oligodendroglioma 
Anaplastic Astrocytoma 
Glioblastoma 
Glioblastoma 
Anaplastic Oligoastrocytoma 
Secondary Glioblastoma 
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Extended Data Table 2 | Sequenced libraries characteristics 


: Total read 
Sample Name Experiment Sequencing Depth Sequencing Format Berne liene number 
Instrument ~ 
(millions) 
GBM1w - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 19.3 
GBM2w - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 17.6 
GBM3w - CTCF CTCF ChIP-seq 38 base pairs Paired end IIlumina NextSeq 500 20.2 
GBM5w - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 30 
GBM6w - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 35.1 
GBM7w - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 36 
AA15m - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 8.7 
AA16m - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 23.7 
AA17m - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 16.3 
OD18m - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 9.2 
AA19m - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 33 
GSC4 - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 19.9 
GSC6 - CTCF CTCF ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 21.9 
BT142 - CTCF CTCF ChIP-seq 38 base pairs Paired end IIlumina NextSeq 500 16 
GSC119 - CTCF CTCF ChIP-seq 50 base pairs Single end Illumina Miseq 6.39 
GBM1w - H3K27ac ~—-H3K27ac ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 12.7 
GBM2w - H3K27ac —H3K27ac ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 10.8 
AA15m - H3K27ac ~~ H3K27ac ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 11.8 
GSC4 -H3K27ac H3K27ac ChIP-seq 38 base pairs Paired end Illumina NextSeq 500 9.7 
GSC6 -H3K27ac H3K27ac ChIP-seq 36 base pairs Single end Illumina Hiseq 2500 10.5 
GSC119 - H3K27ac H3K27ac ChIP-seq 38 base pairs Paired end IIlumina NextSeq 500 9 
aexe a Locus Sequencing 50 base pairs Single end Illumina Miseq 0.539 
i peat as Locus Sequencing 50 base pairs Single end Illumina Miseq 0.639 
insulator sgRNA 
GSC4 bisulfite Bisulfite Sequencing 38 base pairs Paired end Illumina NextSeq 500 0.149 
GSC6 bisulfite Bisulfite Sequencing 38 base pairs Paired end Illumina NextSeq 500 0.149 
BT142 bisulfite Bisulfite Sequencing 38 base pairs Paired end Illumina NextSeq 500 0.149 
GSC119 bisulfite Bisulfite Sequencing 38 base pairs Paired end Illumina NextSeq 500 0.156 


Pertinent statistics are listed for ChIP, genomic DNA and bisulfite-converted sequencing libraries. 
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ILLUSTRATION BY THE PROJECT TWINS. 


TOOLBOX 


THE UNSUNG HEROES OF 
SCIENTIFIC SOFTWARE 


Creators of computer programs that underpin experiments don’t always get their 
due — so the website Depsy is trying to track the impact of research code. 


</> 3 


BY DALMEET SINGH CHAWLA 


or researchers who code, academic 
Pee for tracking the value of their 
work seem grossly unfair. They can 
spend hours contributing to software that 
underpins research, but if that work does not 
result in the authorship of a research paper 
and accompanying citations, there is little way 
to measure its impact. 
Take Klaus Schliep, a postdoctoral researcher 


who is studying evolutionary biology at the 
University of Massachusetts in Boston. His 
Google Scholar page lists the papers that he has 
authored — including his top-cited work, an 
article describing phylogenetics software called 
phangorn — but it does not take into account 
contributions that he has made to other people's 
software. “Compared to writing papers, coding 
is treated as a second-class activity in science,” 
Schliep says. 

Enter Depsy, a free website launched in 


November 2015 that aims to “measure the 
value of software that powers science”. 
Schliep’s profile on that site shows that he 
has contributed in part to seven software pack- 
ages, and that he shares 34% of the credit for 
phangorn. Those packages have together 
received more than 2,600 downloads, have 
been cited in 89 open-access research papers 
and have been heavily recycled for use in other 
software — putting Schliep in the 99th percen- 
tile of all coders on the site by impact. “Depsy > 


7 JANUARY 2016 | VOL 529 | NATURE | 115 


© 2015 Macmillan Publishers Limited. All rights reserved 


» does a good job in finding all my software 
contributions,” says Schliep. 

Depsy’s creators hope that their platform 
will provide a transparent and meaningful 
way to track the impact of software built by 
academics. The technology behind it was 
developed by Impactstory, a non-profit firm 
based in Vancouver, Canada, that was founded 
four years ago to help scientists to track the 
impact of their online output. That includes 
not just papers but also blog posts, data sets 
and software, and measuring impact by diverse 
metrics such as tweets, views, downloads and 
code reuse, as well as by conventional citations. 

In effect, Depsy recognizes the “unsung 
heroes” of scientific software, says Jason Priem, 
co-founder of Impactstory, which is funded by 
the US National Science Foundation and vari- 
ous philanthropic foundations. 

Such a tool is needed, notes Neil Chue Hong, 
founding director of the Software Sustainabil- 
ity Institute in Edinburgh, UK, because there 
are few ways to credit scientists for their soft- 
ware. Young researchers are enthusiastic about 
coding, he says. Last year, he ran a survey of 
1,000 randomly selected UK scientists, which 
suggested that more than 50% of researchers 
develop their own code. Even so, few UK aca- 
demics listed code or software as one of their 
research outputs in the nation’s latest research 
quality audit (the “Research Excellence Frame- 
work’) even in disciplines such as computer 
science that rely heavily on software. “There 
is a culture that reinforces the idea that pro- 
ducing and publishing code has no perceived 
benefit to the researcher,’ Hong says. 


TRACKING SOFTWARE USE 

The usual way to track academic impact — by 
counting citations — still has some relevance 
to software. Researchers can write papers that 
describe their software, as Schliep has done for 
his phangorn package, so that anyone who uses 
the program can cite it in subsequent papers. 
But counting citations is an imperfect meas- 
ure. Researchers may not know which paper to 
cite, argues Priem, because software packages 
often have multiple articles associated with 
them — and some pivotal software projects, 
he says, such as the GDAL Python library, are 
not linked to a canonical paper. 

If software has no associated paper, there is 
no universally recognized way to cite it. Still, 
it is now quite common for coders to assign 
digital object identifiers (DOIs) to their code, 
and increasingly to their data sets as well, notes 
Martin Fenner, technical director of the online 
repository DataCite in Hanover, Germany. 
Software is often first stored in the popular 
code repository GitHub, from which a copy 
can be automatically archived on scholarly 
focused repositories such as Zenodo or Fig- 
share, which allocate DOIs to software and 
thus make it a citable object. Other initiatives 
are trying to ensure that research papers cite 
software in a standardized format — such as by 


using the Research Resource Identifier. 

But counting citations of software DOIs, 
papers or any other standard format does not 
reveal the full impact of coders on science, 
because software so often goes uncited. A 
2015 analysis of 90 random biology papers 
found that two-thirds informally mentioned 
the use of software, but fewer than half of those 
papers actually cited the package. 

Depsy searches through research papers 
to discover both citations and informal men- 
tions of software — of which, unsurprisingly, 
it has found many, 
says Priem, such 


“There is aculture 
as in the acknowl- ‘ 
: that reinforces 

edgement sections ° 

: the idea that 
or the main text of hick d 
academic papers. pabeter rae desir 
But a limitation PUDlsning code 
of the site, Priem has no perceived 
admits, is that it benefit to the 


currently searches 'esear cher.” 


only open-access 
research papers — missing the vast bulk of 
paywalled scholarly content. Impactstory will, 
however, negotiate with publishers for permis- 
sion to mine the text of paid-access literature. 
Mentions in research papers are one of three 
ways in which Depsy tracks the impact of soft- 
ware, Priem says. Second, the site tracks how 
code is reused by others. The name Depsy 
originates from ‘dependency network — an 
overarching term for a map of factors that 
depend on each other, such as software packages 
that recycle code from other packages. Depsy 
calculates the extent to which code is recycled 
by using Google's PageRank algorithm, which 
gives weight to reuse by more-prominent soft- 
ware. From the view of measuring impact, an 
example of code reuse may be more meaning- 
ful than a citation in the literature, Priem notes. 
And third, the site gathers download statistics 
on code packages by trawling through CRAN 
and PyPI, which are the main repositories for 
software written in the popular R and Python 
programming languages, respectively. 


FOCUS ON RESEARCH 

Other websites do some of what Depsy offers. 
Crantastic, for example, is a review site that 
tracks the most popular R packages, and PyPI 
ranking lists the most popular Python modules 
by tracking downloads from PyPI. In addition, 
a few commercial services such as VersionEye 
and Libraries.io track dependency networks, 
explaining which software depends on which 
other packages. 

But Depsy is unconventional in its focus on 
research software, which it distinguishes from 
other code by identifying key words and the 
descriptions and titles of software — although 
the classification process is imperfect, Priem 
says. The site tracks other code, but it includes 
research software only when it calculates the 
percentile impact rankings for academics such 
as Schliep. 
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Depsy apportions fractional credit to each 
participant who has contributed to a soft- 
ware package by counting the percentage of 
code that they have contributed or edited — 
known in the programming world as a per- 
son's ‘commits. Fingerprints of each commit 
are saved in the code, making it easy to track 
down the originator. But not every edit has 
the same impact, and Depsy currently cannot 
distinguish between important contributions 
and trivial ones. The tool may be adapted to 
attempt this distinction — by tracking the 
influence of individual commits — in the 
future, says Priem. 

Depsy also enables users to determine the 
software with the highest impact in specific 
disciplines. A search on Depsy for ‘astrophys- 
ics, for instance, yields 11 software packages, 
of which an analysis and visualization toolkit 
for astrophysical simulations called ‘yt’ has the 
greatest impact; it lies in the 97th percentile of 
all packages. 


OBSTACLES TO PROGRESS 

One of Depsy’s restrictions, notes Hong, is that 
it only tracks code that is available in public 
repositories — so it cannot show the impact 
of commercial software. Moreover, the site 
tracks software in only two coding languages: 
Rand Python. 

But Depsy’s creators aim to eventually 
include other coding languages, and to add 
a fourth way to measure impact: a social- 
influence metric that would take into 
account the number of stars that software 
packages receive from other GitHub users, 
and how many times a piece of software is 
discussed online. 

The site’s code-reuse metrics have their 
limitations, too. Researchers often reuse their 
own code, but might ‘game’ Depsy by repeat- 
edly doing so to garner better profile scores 
— the software equivalent of citing your own 
paper. Another way for researchers to game the 
site might be to start lots of projects but not 
to finish them, Fenner warns, leaving others 
to refine them instead; the project originator 
could then claim credit after the fine-tuned 
versions of their software become prominent. 

“I would love to get to the place where 
people are trying to game Depsy, because it 
would mean people are taking software reuse 
seriously,” Priem says. 

Ultimately, transparent metrics that demon- 
strate the impact of code might enable software 
creators to secure larger funds during grant 
reviews, Hong hopes. Science's coders deserve 
more funding and support, he says — but get- 
ting to that point requires a culture change 
from everyone involved in scientific research. 
“The real irony is that by not rewarding the use 
of software, we're actually putting roadblocks 
in the way of science,’ Hong says. m 


Dalmeet Singh Chawla is a science journalist 
based in London. 
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DATA SHARING 


An open mind on open data 


The move to make scientific findings transparent can be a major boon to research, but it can 
be tricky to embrace the change. 


BY VIRGINIA GEWIN 


acall to make research data, software code 
and experimental methods publicly avail- 
able and transparent. A spirit of openness is 
gaining traction in the science community, and 
is the only way, say advocates, to address a ‘crisis 
in science whereby too few findings are success- 
fully reproduced. Furthermore, they say, it is the 
best way for researchers to gather the range of 
observations that are necessary to speed up 
discoveries or to identify large-scale trends. 
The open-data shift poses a conundrum for 


I isamovement building steady momentum: 


junior researchers, who are carving out their 
niche. On the one hand, the drive to share is 
gathering official steam. Since 2013, global 
scientific bodies — including the European 
Commission, the US Office of Science and 
Technology Policy and the Global Research 
Council — have begun to back policies that sup- 
port increased public access to research. 

On the other hand, scientists disagree about 
how much and when they should share data, 
and they debate whether sharing it is more likely 
to accelerate science and make it more robust, 
or to introduce vulnerabilities and problems. 

As more journals and funders adopt 
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data-sharing requirements, and as a growing 
number of enthusiasts call for more openness, 
junior researchers must find their place between 
adopters and those who continue to hold out, 
even as they strive to launch their own careers. 

One key challenge facing young scientists is 
how to be open without becoming scientifi- 
cally vulnerable. They must determine the risk 
of jeopardizing a job offer or a collaboration 
proposal from those who are wary of — or 
unfamiliar with — open science. And they 
must learn how to capitalize on the move- 
ment’s benefits, such as opportunities for more 
citations and a way to build a reputation > 
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> without the need for conventional metrics, 
such as publication in high-impact journals. 

The nascent era of openness is best embodied 
by the Transparency and Openness Promotion 
(TOP) guidelines for journals, first published’ 
in Science by researchers at the Center for Open 
Science in Charlottesville, Virginia. Adoption of 
the guidelines by a journal or organization signi- 
fies to the research community that it supports 
transparency, openness and reproducibility 
(whether an experiment can be replicated by 
the original researcher or by someone else). 

Those tenets apply to all aspects of science, 
including experimental design, data sharing and 
the publication of null findings and replication 
studies. As Nature went to press, 538 publishers 
and journals — including Elsevier and Springer 
Nature — had signed up to the TOP guidelines, 
along with 57 organizations, among them the 
American Association for the Advancement of 
Science, which publishes Science. 


A DRIVE TO REPRODUCE 

Some fields have embraced open data more 
than others. Researchers in psychology, a field 
rocked by findings of irreproducibility in the 
past few years, have been especially vocal pro- 
ponents of the drive for more-open science. In 
one of the latest examples of irreproducibility 
issues, investigators tried to replicate results 
from 100 psychological studies but succeeded 
in fewer than half of them’. 

A few psychology journals have created 
incentives to increase interest in reproduc- 
ible science — for example, by affixing an 
‘open-data’ badge to articles that clearly state 
where data are available. According to social 
psychologist Brian Nosek, executive director 
of the Center for Open Science, the average 
data-sharing rate for the journal Psychological 
Science, which uses the badges, increased ten- 
fold to 38% from 2013 to 2015. 

Funders, too, are increasingly adopting an 
open-data policy. Several strongly encourage, 


and some require, a data-management plan 
that makes data available. The US National 
Science Foundation is among these. “There 
used to be no enforcement, but that’s chang- 
ing,” says Karthik Ram, a data scientist at the 
Berkeley Institute for Data Science in California 
and co-founder of ROpenSci, which develops 
open-source software programmes. Some phil- 
anthropic funders, including the Bill & Melinda 
Gates Foundation in Seattle, Washington, and 
the Wellcome Trust in London, also mandate 
open data from their grant recipients. 

Others, such as the Gordon and Betty Moore 
Foundation in Palo Alto, California, encourage 
sharing but do not require it. Still, the trend 
is clear, says Carly Strasser, who oversees the 
foundation’s Data-Driven Discovery Initiative. 
“Open science, data sharing, software sharing 
is the future of science,’ she says. “It’s only 
going to get more difficult to engage in science 
without being open.” 

But many young researchers, especially 
those who have not been mentored in open 
science, are uncertain about whether to share 
or to stay private. Graduate students and post- 
docs, who often are working on their lab head’s 
grant, may have no choice if their supervisor or 
another senior colleague opposes sharing. 

Some fear that the potential repercussions 
of sharing are too high, especially at the early 
stages of a career. “Everybody has a scary story 
about someone getting scooped,’ says New York 
University astronomer David Hogg. Those fears 
may be a factor ina lingering hesitation to share 
data even when publishing in journals that 
mandate it (see Nature 515, 478; 2014). 

Researchers at small labs or at institutions 
focused on teaching arguably have the most 
to lose when sharing hard-won data. “With my 
institution and teaching load, I don't have post- 
docs and grad students,” says Terry McGlynn, a 
tropical biologist at California State University, 
Dominguez Hills. “The stakes are higher for 
me to share data because it’s a bigger fraction 


LEARN TO SHARE 


Open -data pro tips 


Scientists who are cautious about open 
science can start small by sharing data for 
a project that they have already completed. 
Specialists in the field offer this advice: 

@ Documenta data-deposition plan while 
working on publications, so that the data 
and the paper will be ready for publication at 
the same time. It is not necessary, however, 
to release data alongside a paper, unless a 
funder mandates it. 

@ Craft a very explicit statement about data 
reuse — including who can use the data, 
how to use them and how to attribute them. 
@ Machine-readable data will be most 
easily combined with other data sets. Avoid 


proprietary data formats, such as Microsoft 
spreadsheets, or colour-coded cells that are 
readable only by humans. 

@ Permanenily archive data in reputable 
repositories such as FigShare or Zenodo, not 
on a personal website. 

@ If you choose to share data from a 

new project, make sure to generate the 
relevant metadata as you go. It is very hard 
to reconstruct important details after the 
fact. Tools such as those on Zenodo enable 
researchers to document such details 
throughout a project, so that all you have to 
do is flip a switch when you are ready 

to share. V.G. 
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of what’s happening in my lab” 

Researchers also point to the time sink that 
is involved in preparing data for others to view. 
Once the data and associated materials appear 
in a repository, answering questions and han- 
dling complaints can take many hours. 

The time investment can present other 
problems. In some cases, Ram says, it may 
be difficult for junior researchers to embrace 
openness when senior colleagues — many of 
whom head tenure and promotion committees 
— might scoff at what they may view as mis- 
placed energies. “I've heard this recently — that 
embracing the idea of open data and code makes 

traditional academ- 


ee ee 
lata sharin. - 
software a cern seems to be that 
sharing is open advocates don't 
the future of spend their time 


being as productive 
as possible.” 

An open-science 
stance can also add complexity to a collabora- 
tion. Kate Ratliff, who studies social attitudes at 
the University of Florida in Gainesville, says that 
it can seem as if there are two camps ina field 
— those who care about open science and those 
who dont. “There's a new area to navigate — 
‘Are you cool with the fact that P'll want to make 
the data open?’ — when talking with somebody 
about an interesting research idea,” she says. 


science.” 


GLASS HALF FULL 
Despite complications and concerns, the 
upsides of sharing can be significant. For exam- 
ple, when information is uploaded to a reposi- 
tory, a digital object identifier (DOI) is assigned. 
Scientists can use a DOI to publish each step of 
the research life cycle, not just the final paper. In 
so doing, they can potentially get three citations 
— one each for the data and software, in addi- 
tion to the paper itself. And although some say 
that citations for software or data have little cur- 
rency in academia, they can have other benefits. 
Many advocates think that transparent data 
procedures with a date and time stamp will pro- 
tect scientists from being scooped. “This is the 
sweet spot between sharing and getting credit 
for it, while dissuading plagiarism,” says Ivo 
Grigorov, a project coordinator at the National 
Institute of Aquatic Resources Research Secre- 
tariat in Charlottenlund, Denmark. Hogg says 
that scooping is less of a problem than many 
think. “The two cases I'm familiar with didn't 
involve open data or code,” he says. 

Open science also offers junior researchers 
the chance to level the playing field by gain- 
ing better access to crucial data. Ross Mounce, 
a postdoc studying evolutionary biology at 
the University of Cambridge, UK, is a vocal 
champion of open science, partly because his 
fossil-based phylogenetic research depends on 
access to others’ data. He says that more open- 
ness in science could help to dissuade what 
some perceive as a common practice of shutting 
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out early-career scientists’ requests for data. 

There is some evidence to support that 
statement. A study in 2014 sought data from 
217 studies published between 2000 and 
2013. But the team could secure only 40% of 
what they requested, and responses varied 
according to the requester’s seniority’. 

McGlynn says that many of the obsta- 
cles — whether real or perceived — to open 
science can be sidestepped. He is on the 
editorial board for the journal Biotropica, 
which encourages — but does not require 
— authors to contact the original researcher 
when they use someone else’ archived data, 
which can be embargoed for up to three 
years. “Not only will you get their valuable 
insights, but it’s inclusive and fair,’ he says. 

Communication also helps for those who 
worry about jeopardizing a collaboration, he 
says. Concerns about open science should be 
discussed at the outset ofa study. “Whenever 
you start a project with someone, you have 
to establish a clear understanding of expec- 
tations for who owns the data, at what point 
they go public and who can do what with 
them,” he says. 

It isn't hugely difficult to share data (see 
‘Open-data pro tips’). Online reposito- 
ries such as FigShare or Zenodo make it 
increasingly easy to deposit scientific con- 
tent for widespread consumption. More than 
400 virtual communities have formed to 
share data, software and documented work- 
flows so that a user can deploy them straight 
away, says Tim Smith, who oversees collabo- 
ration and information services at Zenodo. 
The repository launched in May 2013 at 
CERN, Europe’ particle-physics laboratory 
near Geneva, Switzerland. 

And although there is a time cost asso- 
ciated with uploading and organizing 
raw data, subsequent queries can often be 
averted by adding reader-friendly instruc- 
tions at the start. Hogg recommends that 
researchers simultaneously upload tutori- 
als and examples of how to use the content. 

In the end, sharing data, software and 
materials with colleagues can help an early- 
career researcher to garner recognition — a 
crucial component of success. “The thing 
you are searching for is reputation,” says 
Titus Brown, a genomics researcher at the 
University of California, Davis. “To get 
grants and jobs, you have to be relevant 
and achieve some level of public recogni- 
tion. Anything you do that advances your 
presence — especially in a larger sphere, 
outside the communities you know — is a 
net win. » 


Virginia Gewin is a freelance writer in 
Portland, Oregon. 


1. Nosek, B. et al. Science 348, 1422-1425 
(2015). 

2. Open Science Collaboration Science 349, 6251 
(2015). 

3. Magee, A. F. etal. PLoS ONE 9, €110268 (2014). 


TURNING POINT 


Andrew Simons 


From 2008 to 2011, Andrew Simons led a 
programme in Ethiopia for a US-based non- 
profit relief organization. The former biologist 
recently earned a PhD in applied economics 
from Cornell University in Ithaca, New York, 
as a pathway to explore policies that could 
help to improve global food security — reliable 
access to affordable and nutritious food. 


What sparked your interest in helping 
developing nations? 

In 2000, as a biology undergraduate, I spent a 
semester in Latin America studying tropical 
biology. I lived with rural families in Guatemala 
and Nicaragua, where I saw grinding poverty. 
One night, I saw a woman rummaging through 
the garbage to find clothing. It was heartbreak- 
ing. I thought a lot about poverty and the ‘right’ 
response from someone living a relatively 
wealthy life in the United States. 


How did you shift away from biology? 

I went straight to a summer internship at a 
biophysics lab at Texas A&M University in 
College Station. There, I saw a powerful con- 
trast between the economically privileged, 
who had access to technology, and the poor, 
who had no such access. I had always thought I 
would go into molecular genetics and work on 
crops that could improve nutrition and food 
security. But during my internship, I started 
thinking more broadly about how technology 
could be used to help the poor. 


Did you pursue more opportunities overseas? 

Yes. I did a short internship in the Dominican 
Republic with a US-based, Christian interna- 
tional-relief organization that sent groups to 
build a clinic in the slums of Santa Domingo. 
As they got more money, they went on to 
build homes. While there, I searched for and 


found a masters programme in international 
development at the John E Kennedy School of 
Government at Harvard University in Cam- 
bridge, Massachusetts. I was able to tailor my 
coursework to explore aspects of human health. 


What brought you back to Ethiopia in 2008? 
Ihad done short stints there and in Honduras, 
and I returned as director of programmes 
with a group that worked to alleviate chronic 
food insecurity in rural areas. We developed 
an initiative that provided food and cash 

to 300,000 people. We also planted trees 
throughout the country. 


Why did you decide to pursue a PhD in 
economics? 

I couldn't help thinking, instead of helping 
300,000 people, what if I had the ear of govern- 
ment and could suggest policies that could 
help 7-8 million people? I was inspired by the 
work of Chris Barrett, an applied economist at 
Cornell who works on global food security and 
critiques food-aid projects worldwide. He has 
a lot of influence on governments, which are 
interested in his advice on how to make food- 
security efforts work better. My experience in 
Ethiopia paved the way for me to work on a 
handful of projects in East Africa for my PhD. 


Can you describe some of the projects that you 
worked on in Ethiopia? 

I monitored the use of fuel-efficient stoves. For 
6 months, we tracked 1.7 million temperature 
data points from sensors in people’s homes to 
understand when and how they used the stoves. 
In addition, I worked on a project to turn animal 
bones into a soil fertilizer. These projects aim to 
solve real problems — problems that will never 
be solved just by soil science or by applied eco- 
nomics. We've got to combine insights from all 
these areas to find useful solutions. 


How have these experiences positioned you 
for the job market? 

I have a wider tool kit than does someone who 
has studied just one discipline. I have an eco- 
nomics hammer, but I also have a few others 
to pick from. I want a job at a public-policy 
school — I’m gearing up to apply for more than 
100 academic positions this year. [like working 
with non-governmental organizations, but I feel 
that an academic route will give me the chance 
to design research with people who can provide 
meaningful input on policy discussions. m 


INTERVIEW BY VIRGINIA GEWIN 


This interview has been edited for length and clarity. 
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Utes SCIENCE FICTION 


GHOSTS IN THE MACHINE 


BY AARON MOSKALIK 


CC ello, this is Eric. For quality 
H assurance, this conversation is 
being recorded across all modal- 

ities. How may I help you?” 

“My social-security cheque is late again.” 

“Tm sorry to hear that ... Mr William- 
son. Let me pull up your file. 
Just one moment. Ah, I see 
the problem. You haven't con- 
firmed your vitality status with 
us this month. You should’ve 
received a reminder —” 

“Vitality status?” 

“We need to confirm that 
youre alive, Sir” 

“Tm talking to you aren't I?” 

“Of course, Sir, but the pos- 
sibility exists that, like myself, 
you are a personality proxy 
designed to carry out mundane transactions 
for your primary. It is your primary’s vitality 
we are concerned with” 

“Are you calling me a machine? I want to 
talk with a real person.” 

“Tm sorry, Sir, but that’s not possible at 
this time?” 

“It’s two in the afternoon! Let me talk to 
your manager.’ 

“Of course, Sir. Transferring you now...” 

“Hello Mr Williamson. This is Anne. I 
understand you need assistance verifying 
your vitality status.” 

“Tm alive, dammit! OK, sure. Whatever. 
How do! do that?” 

“The simplest way would be to give us access 
to your health monitor. We accept data from 
any number of devices: FitnessTrack, Skinny- 
Mini, HelpMeUp, FallStall, StillKickin’ —” 

“T don't have any of those. Don't believe in 
them. Anyone can hack in and know what 
you're doing by looking at the data. Total 
invasion of privacy — youre telling me I 
can't get my hard-earned cheque unless I let 
you peeping Toms in on everything I do?” 

“Now calm down, Mr Williamson. I 
understand your qualms —” 

“Dont tell me to calm down. Tell me why 
you need to hassle good tax-paying citizens 
with all these nonsense requirements. When 
I die, you'll damn well know about it —” 

“Actually, that’s not true, Sir. We’ve had 
cases where deceased citizens have contin- 
ued to collect benefits for years, undetected 
by us. All activity continued as before. They 
paid bills, consumed services, sent e-mails, 
posted on social media... all seamlessly 
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Cheque mate. 


maintained by their automated proxies. So, 
hopefully you understand our need to con- 
firm your status.” 

“Mr Williamson?” 

“Yeah, fine. But I’m not sending you any 
personal data logs” 

“That's not strictly necessary. If you prefer, 


you can opt for independent verification. ’m 
sending you a list of local contractors who, for 
a small fee, will visit your home in person.” 
“Sure. Whatever. Thanks for nothing.” 
“Tt’s our pleasure, Mr Williamson. Have a 
nice day.” 


“What was that all about, Harold?” 

“You listening in on me again, Mags?” 

“Let me guess, your cheque is late again. 
When are you going to give them your moni- 
tor feed?” 

“Howd you know about that?” 

“They've required it for years. Get your 
memory checked.” 

“You gave them access to yours?” 

“Why not? Oh, I forgot, they’re going to 
know everything we do. Oooo0, scary. News 
flash: we don’t do anything.” 

“But ... [honestly don't have a monitor...” 

“T know, dear. Listen, I’m going to let you 
in ona secret. Maybe you'll remember it this 
time. Ready? Get on the encrypted chan- 
nel... good, here it is. Neither do I. Down- 
load Health Data Simulator. It’s a free app. 
Set it and forget it” 

“But that’s... fraud.” 

“T forgot, youre still in denial. Fine, do it 
your way. Dinner will be ready in ten minutes” 


“Hello, is this Mr Schlicker? The Social Secu- 
rity Agency gave me your name —” 

“Mr Williamson! Good to hear from you.” 

“Tve called you before?” 

“Every month like clockwork” 

“[ don't remember you... you come by my 
house once a month to make sure I’m alive?” 
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“That’s not necessary. I just file the 
verification paperwork on your behalf. 
Easy-peasy. You get your cheque, I get my 
fee, everyone wins.” 

“So you don't actually verify I’m alive?” 

“Mr Williamson ... are you sitting down?” 

“Y-yes.” 

“No, youre not. You cant sit down because 
youre not real. No one is. Real 
living people havent existed for 
hundreds of years.” 

“Mr Williamson? Snap out 
of it, man” 

“T cant feel my body.” 

“You don't have a body.” 

“T had one a minute ago! 
What did you do to me?” 

“The truth can be very diso- 
rienting. You need to stay with 
me. I'll talk you through it —” 

“This is horrible. I can’t breathe. The 
world has gone black.” 

“Focus on my voice, Mr Williamson” 

“Tt’s not really a voice, is it?” 

“Now youre getting it —” 

“Change it back. Make it all real again” 

“I do have another service I can offer 
you. I’m afraid it’s a little more expensive, 
though —” 

“Anything, just hurry.” 

“Tt’s called personality renormalization. I 
have to warn you, it'll wipe out your memory 
of the last few minutes.” 

“Good. Please...” 

“Beginning now...” 

“Isee... Iseea light” 

“Go towards the light, Mr Williamson” 

“There are people there. Happy people. 
Beautiful people. They do exist! They can't 
see me though. Hello, hello. Over here!” 

“Look for yourself, Mr Williamson.” 

“There I am! I’m not as old as I thought, 
a handsome devil too. I’m being pulled 
towards him ... myself.1...” 

“Are you OK, Mr. Williamson?” 

“What? Who are you?” 

“This is Mr Schlicker. I was just telling 
you why your cheque will be a little light 
this month” 

“Oh. Oh, right. But I will get one?” 

“Absolutely. I'm filing all the necessary 
paperwork now. Take care, Mr Williamson. 
Tl talk to you next month.” m 


Aaron Moskalik is a software architect 
and speculative-fiction writer based near 
Detroit, Michigan. 
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